Home Assistant Randomly Crashing on RPi4

My Home Assistant has been randomly crashing every so often for almost a year now and I haven’t been able to find a reason. Sometimes it comes back on its own, sometimes it doesn’t. It’s never pingable, even when it comes back on its own. Sometimes it crashes once or twice every day, sometimes it doesn’t crash for a week.

I’m running HA OS on an RPi4 4GB, with an unpowered SSD, Zigbee/Zwave dongle, and a 3.5A power supply.

I’ve pasted my whole HA log file below. The most recent crash was today, 02/04/22 at around 2:00 PM (which required a restart), but another crash happened just last night, although I don’t think last night’s crash is in this log since I had to restart.

I am hoping someone with more experience would be able to take a look at the logs for me and maybe find a root cause so I can get this issue fixed - it’s becoming super disheartening opening the app to see my HA is offline nearly 30% of the time.

Edit: looks like Hatebin didn’t like how big my log was. Here’s a pastebin link, hopefully the entire log uploads:

1 Like

From the last lines of the log, I’d suspect the RPI loses network connectivity for whatever reason.
Nothing more can be said.

Are you on Wifi? Maybe try a wired connection…

It’s connected directly to my router through ethernet, so that can’t possibly be the reason it’s disconnecting. I always have internet when Home Assistant goes offline, so it’s not my internet at all.

What version of HAOS are you on? If prior to 7.2 many people had this same issue. It was fixed late December and released in January 2022.

I am currently on 2021.12.10

Strange reboots on a Pi are mostly power supply related.

3.5A would be enough for sure but is it a high quality power supply which is constantly delivering that energy? Or have you measured if it’s even capable of delivering 3.5A? (Some cheap ps claim to do so but in reality do not.)

Worth checking before chasing phantoms…

It’s a CanaKit 3.5A USB-C Power Supply for Raspberry Pi 4. I’m not personally knowledgeable in how I would measure whether it’s delivering enough power, but this random crashing issue has been going on long before I even had an SSD or ZigBee dongle, back when I just had the Pi itself with an SD card, so I can’t imagine both power supplies were faulty, but maybe I’m wrong.

The OS version.

My bad. Home Assistant Supervisor says the version is Home Assistant OS 7.2.

Once I went to 7.2 and above I have been stable. Before this I was 1 to 2 days at most. Had this issue since the OS 5.4 update. There are several github (now closed) issues on this. Check out

https://github.com/home-assistant/operating-system/issues/1119

Almost 700 comments on this. Open a new issue with your logs and all the other info. Stefan is vey good on troubleshooting HAOS issues. If there is info in the logs he can help.

Bill

I’ve had my HA/RPi crash exactly once in the last two years I’ve been running it (knock). And that was about a month ago when things were non-responsive anywhere on my Intranet (and thus wifi and Internet). I first thought my router had died, since it was affecting every device, wired, wireless, and Internet access.

I then noticed the activity light on my RPi was blinking non-stop. It had somehow gotten stuck flooding the network. I had to power off, and it booted back up fine, and all was good again. I was, I believe, on OS 7.0 at the time, and upgraded to 7.1 after someone here advised of the issue. I also just upgraded to 7.2 yesterday. I haven’t experienced it since. Not sure if it was cured, or if I’m just lucky.

It would have been bad if I had been away from home and it had happened. Sorry to hear folks are having this experience.

Have you seen this topic?

I have now! My RPi4 Power Status has been a solid OK since January so it’s probably not my power supply. As for the other issues with the SSD and Zigbee interference, my SSD is plugged into the USB-3 port, but like I said this issue has been happening since long before I even had an SSD or Zigbee dongle at all, so that can’t be the fundamental reason for the crashes, but I suppose it wouldn’t hurt to get a USB extension for the Zigbee/Zwave dongle anyway.

No high loads before a crash?
Most likely not, but better safe than sorry.
If you have high loads, you might consider checking CPU wait.

Don’t want to send you in the wrong direction, but any WiFi channel interference?

Looking at my RAM and CPU usage statistics right before the 3 crashes in the last 3 days, nothing seemed out of the ordinary to me. My RAM usage sits at around 70% normally and that’s what it was when it crashed, whereas my CPU usage sits at around 30% to 40%, and no spikes were observed prior to crashing, only after (assuming due to me restarting home assistant).

As for WiFi channel interference… I have Zigbee on channel 20 in my configuration.yaml, don’t have Zwave set up, and my router says 2.4 GHz Wi-Fi is on channel 1, so I think that should probably be fine.

I don’t know what channel your neighbors are on, if they are nearby. But as said, that’s more a long short.

About your CPU. There’s a difference between load and your CPU usage in Linux. The usage percentage can be low or fine, although 70% kinda high and your CPU temperature might be an issue here if it’s not actively cooled in a way, but the load (see it as the “queue” of tasks to process) can be high.

Here is mine. CPU temp is 63 at the moment.

Edit: In case you are wondering how to get the insights. The earlier link on high CPU load shows a custom sensor for IO wait, but with an SSD that might only be relevant if you have high CPU loads. Those (and more, like CPU temp) can be found using this:

The 70% was for my memory, since I only have 2G of RAM I would imagine 70% usage is about right, given the amount of add-ons and stuff I have. As for the CPU load/usage, right now my load sensor says 2.42 for 1m and 2.44 for 5m, and the processor usage sensor says 36%. Doesn’t seem horribly high to me, but this is way out of my league so maybe I’m wrong.

You’re right, I mixed up the two. The load isn’t low for a Pi either you’d ask me. But am not a Linux expert myself. For as far as I understood the reading I did, a major cause of high loads with not too high CPU usage is the IO wait. It’s basically the CPU waiting for the (disk) read/write to finish. One can imagine that that makes the system freeze.

Even with a relatively fast SSD on USB3, I still see waits of 20% multiple times a day. I know what’s causing it and it’s within my acceptance range, but it could result in a less snappy response of the GUI once in a while.

Before I had the SSD it completely froze as can be read in the earlier topic. That was even without the IO consuming process I have rubbing on it.

So, my suggestion would be. Add the custom sensor for IO wait, you can always remove it from your configuration.yaml, and monitor the situation.

I even created a trigger once my Zigbee became unavailable, with a message sending me the stats. Might be useful in case the m they get lost in a reboot.

Thanks for the suggestion. I added an IO wait sensor to HA and it’s reporting 9% right now but I’ll keep an eye on it.

Based on what you are writing in the topic, it’s reasonable to say your Pi has to work for its money.

I hope the info above will lead to a solution. For now I’d day, watching the loads is the right thing to do.

I’ll keep the topic in my watchlist