Frequent restarting without an obvious cause?

Hi all,

In a nutshell, I’ve had a really stable HA installation for several years now and over the last week it’s started rebooting quite frequently (as in many times a day and even sometimes multiple times in an hour). The system has moments of stability where I think it looks okay and then you’ll interact with it (eg press a switch linked to an automation, open the app etc) and realise it’s gone down again and is rebooting.

At this point I’m a bit stuck as to what next to try to diagnose what is going on so guidance would be greatly appreciated.

Some extra info:

  • HA OS 2024.4 running on an RPI 3 with 1GB of RAM
  • System has been running stable since 2021 without any major dramas
  • System performance is generally pretty stable outside of boot with about 15% CPU usage and 80% memory usage (almost no fluctuation on this).
  • OS is on SD card with data on an external USB SSD.
  • No new devices, integrations etc have been added for many, many months possibly even a year or more.
  • A reboot is a from scratch reboot, not just restarting services, reloading config etc. It fully goes down, drops off the network and then comes back up again from a cold start.

What have I tried?

  • Restore from backup - have rolled back several versions sequentially as the problem persisted from immediately before upgrade through to several versions going back to a recent 2024.4 version that had been very stable for several weeks
  • Changed the SD card - worried my SD card was failing, I installed HA OS onto a brand new SD card and then restored onto that.
  • Changed the RPI - I have many RPI3s so I changed to a new one to test it wasn’t something hardware related
  • Change the power cable - as I wasn’t sure of the cause of the reboot I was concerned the power cable may be dying so changed that out
  • Changed the power source - similar to above.
  • Read through the logs line by line to see if I could observe an error or something but there’s nothing that appears obvious that isn’t “normal” for my logs and installation.

I’ve done each one of these things in isolation to be able to observe the change. I’ve also tried to characterise what the last thing was that happened before I observed a reboot to see if there was something consistent (eg pressing a particular button etc).

The only thing I haven’t yet tried is to run an absolutely fresh install with nothing integrated on my existing device for a period of time to see if that fails. I am not sure yet as to the value of this when I’ve changed the hardware and SD card over… (open to thoughts on that though).

I was hoping someone might have some further thoughts about what I might be able to try that I haven’t done already or other things I might be able to do to diagnose things. It’s amazing that something that was so rock-solid stable that it had disappeared into the background and I barely thought about it has recently consumed so much attention - maybe my install was just feeling a bit unloved… :wink:

It’s probably not Home Assistnat core that is the issue. It is the operating system. What version of HAOS are you using?

12.3 has caused issues for some people.

Try rolling that back to 12.2

I am running 12.3. I’ll give 12.2 a go and see how that performs. Thanks @tom_l

Please report back what you’ve found. I’ve had some of the symptoms referenced in that link, above, along with a memory leak I never had before. I’ve been working through a process-of-elimination approach with no luck so far. Now I’m thinking reverting the OS might be the best next step.

I’ve been procrastinating, knowing I may face another two-day debugging and diagnostic nightmare. But I didn’t notice how many other problems were caused by 12.3.

So I’m pretty sure this was it.

I downgraded from 12.3 to 12.2 yesterday afternoon and as of now the system has been running about 16 hours or so without any reboots / instability etc so I’m quite hopeful.

One thing I’d note for @CaptTom or anyone else that comes across this thread. To downgrade on my system, because of the instability I ended up:

  • Turning on the SSH addon
  • allowing SSH access from port 22 (turn on network and set a port)
  • add a key (or if you’re doing this very temporarily you could just use a password)
  • then SSH in from my laptop to run the command ha os upgrade --version 12.2

If you had a monitor and keyboard attached to your system (I don’t; and it’s in quite an inconvenient location to attach one) then you could do it from the console just as easily.

The reason I needed to do it this way, is because each time I tried to just do this via the web UI, the system would reboot - which then wasted a stack of time going through that whole cycle again. This approach seemed to be more stable as a way to connect.

One thing I’ve noticed since the downgrade is how much faster 12.2 is to get to a fully working home assistant instance. Having watched the boot cycle about 20+ times in the last several days, there’s like an order of magnitude difference in time to usable system (a minute or two versus 10-15 minutes).

2 Likes

Thanks! I’ve always treated the OS update as an afterthought. Until now, it’s always been a non-issue, something to do just to keep current. I’m beginning to think that’s been a mistake.

Maybe I’ll need to do what I do with the HA Core updates; wait a few weeks to see what’s reported as breaking, then block out some time to do a bunch of testing, and to restore if necessary. For now 12.3 is working, but still showing erratic memory utilization. On some of my reboots I also noticed such poor response times that I thought it had locked up.

And, yes, I always use the SSH add-on. Far better than digging the hardware out and hooking up a monitor and keyboard.