Home Assistant Randomly Crashing on RPi4

I am currently on 2021.12.10

Strange reboots on a Pi are mostly power supply related.

3.5A would be enough for sure but is it a high quality power supply which is constantly delivering that energy? Or have you measured if it’s even capable of delivering 3.5A? (Some cheap ps claim to do so but in reality do not.)

Worth checking before chasing phantoms…

It’s a CanaKit 3.5A USB-C Power Supply for Raspberry Pi 4. I’m not personally knowledgeable in how I would measure whether it’s delivering enough power, but this random crashing issue has been going on long before I even had an SSD or ZigBee dongle, back when I just had the Pi itself with an SD card, so I can’t imagine both power supplies were faulty, but maybe I’m wrong.

The OS version.

My bad. Home Assistant Supervisor says the version is Home Assistant OS 7.2.

Once I went to 7.2 and above I have been stable. Before this I was 1 to 2 days at most. Had this issue since the OS 5.4 update. There are several github (now closed) issues on this. Check out

https://github.com/home-assistant/operating-system/issues/1119

Almost 700 comments on this. Open a new issue with your logs and all the other info. Stefan is vey good on troubleshooting HAOS issues. If there is info in the logs he can help.

Bill

I’ve had my HA/RPi crash exactly once in the last two years I’ve been running it (knock). And that was about a month ago when things were non-responsive anywhere on my Intranet (and thus wifi and Internet). I first thought my router had died, since it was affecting every device, wired, wireless, and Internet access.

I then noticed the activity light on my RPi was blinking non-stop. It had somehow gotten stuck flooding the network. I had to power off, and it booted back up fine, and all was good again. I was, I believe, on OS 7.0 at the time, and upgraded to 7.1 after someone here advised of the issue. I also just upgraded to 7.2 yesterday. I haven’t experienced it since. Not sure if it was cured, or if I’m just lucky.

It would have been bad if I had been away from home and it had happened. Sorry to hear folks are having this experience.

Have you seen this topic?

I have now! My RPi4 Power Status has been a solid OK since January so it’s probably not my power supply. As for the other issues with the SSD and Zigbee interference, my SSD is plugged into the USB-3 port, but like I said this issue has been happening since long before I even had an SSD or Zigbee dongle at all, so that can’t be the fundamental reason for the crashes, but I suppose it wouldn’t hurt to get a USB extension for the Zigbee/Zwave dongle anyway.

No high loads before a crash?
Most likely not, but better safe than sorry.
If you have high loads, you might consider checking CPU wait.

Don’t want to send you in the wrong direction, but any WiFi channel interference?

Looking at my RAM and CPU usage statistics right before the 3 crashes in the last 3 days, nothing seemed out of the ordinary to me. My RAM usage sits at around 70% normally and that’s what it was when it crashed, whereas my CPU usage sits at around 30% to 40%, and no spikes were observed prior to crashing, only after (assuming due to me restarting home assistant).

As for WiFi channel interference… I have Zigbee on channel 20 in my configuration.yaml, don’t have Zwave set up, and my router says 2.4 GHz Wi-Fi is on channel 1, so I think that should probably be fine.

I don’t know what channel your neighbors are on, if they are nearby. But as said, that’s more a long short.

About your CPU. There’s a difference between load and your CPU usage in Linux. The usage percentage can be low or fine, although 70% kinda high and your CPU temperature might be an issue here if it’s not actively cooled in a way, but the load (see it as the “queue” of tasks to process) can be high.

Here is mine. CPU temp is 63 at the moment.

Edit: In case you are wondering how to get the insights. The earlier link on high CPU load shows a custom sensor for IO wait, but with an SSD that might only be relevant if you have high CPU loads. Those (and more, like CPU temp) can be found using this:

The 70% was for my memory, since I only have 2G of RAM I would imagine 70% usage is about right, given the amount of add-ons and stuff I have. As for the CPU load/usage, right now my load sensor says 2.42 for 1m and 2.44 for 5m, and the processor usage sensor says 36%. Doesn’t seem horribly high to me, but this is way out of my league so maybe I’m wrong.

You’re right, I mixed up the two. The load isn’t low for a Pi either you’d ask me. But am not a Linux expert myself. For as far as I understood the reading I did, a major cause of high loads with not too high CPU usage is the IO wait. It’s basically the CPU waiting for the (disk) read/write to finish. One can imagine that that makes the system freeze.

Even with a relatively fast SSD on USB3, I still see waits of 20% multiple times a day. I know what’s causing it and it’s within my acceptance range, but it could result in a less snappy response of the GUI once in a while.

Before I had the SSD it completely froze as can be read in the earlier topic. That was even without the IO consuming process I have rubbing on it.

So, my suggestion would be. Add the custom sensor for IO wait, you can always remove it from your configuration.yaml, and monitor the situation.

I even created a trigger once my Zigbee became unavailable, with a message sending me the stats. Might be useful in case the m they get lost in a reboot.

Thanks for the suggestion. I added an IO wait sensor to HA and it’s reporting 9% right now but I’ll keep an eye on it.

Based on what you are writing in the topic, it’s reasonable to say your Pi has to work for its money.

I hope the info above will lead to a solution. For now I’d day, watching the loads is the right thing to do.

I’ll keep the topic in my watchlist

I am runninf HA on Rpi 4 since mid december 22, and last month or so I am experiencing this same behaviour, random crasehes, sometimes after a week, sometimes twic in oe day, alway I find evrything umresponsive and Aragon fan at full speed, I need to power off and on to make it work again.

I have my RpI in Aragon case with active fan, conbee II and SSD on USB.

Where can I find logs tat might help to identify the issue?

I have the same setup as yours, and I’ve been experiencing the same issues since two weeks ago.

I had three crashes since this Friday, and every time I had someone manually unplug and plug the RPi back in to solve the issue (I’ve not been home for some time now).

The crashes happen at a completely random time and there is no sign of any memory leak prior to the crashes that may have been able to cause them

Is there any way to keep the logs after a system restart? The only way for me to fix the issue remotely is to ask someone to manually restart the system, and after that all the logs are long gone, so there are no clues for me to follow.

I changed the power source and it seems to solve the problem.

did you ever find what was causing this? I have exact same hardware, Pi4 4gb, ssd sonoff and Bluetooth on powered supply, pi4 is canakit re-branded with their PSU, it’s been happening for almost 2 years now, once or twice a month, changed PSU didn’t help, everything else looks normal. Was trying to download journal logs that doesn’t get erased but wasn’t able to do so