Home Assistant keeps restarting

So HA keeps restarting. I’m amazed it hasn’t corrupted the SD card. Nothing in the logs, mainly because aren’t they created afresh everytime HA boots? Can’t find the ‘system/supervisor’ ones on disk but the ones I can access from the web GUI just start at boot time. So I have no clue what’s causing the reboots.

I can go days without one or have two in the space of an hour.

Don’t think it’s the power supply - I’ve tried a new plug and a new cable. I guess it could be the SD card but as I can’t take a snapshot I can’t really very easily try a new SD card. Catch 22.

A sure fire way to get a reboot is to try and make a snapshot. It’s fine for maybe 10 minutes then restarts out of the blue.

HELP!

Thanks :slight_smile:

There is a file: home-assistant.log.1 for the previous boot.

Weird, nothing there, either. I have just been able to get it to make a snapshot without crashing by doing it from the CLI. My database is 500mb big so I wonder if that could be part of it? I don’t know why initiating from CLI would work and GUI would crash, though. Please nb it’s not ONLY creating snapshots that causes crashes, they come at random other times of the day as well.

I’m getting continual restarts as well. I’ve installed Glances and Grafana and have narrowed it down to the homeassistant core container (I think) spiking the CPU usage for a period which causes the restart. Which I take to mean it isn’t any other add-ons. I’ve disabled all addons and integrations that are unecessary and now I’m down to a pretty bare-minimum system for me. Any further would mean disabling key functions for me. Not sure how to progress bug fixing this one. Any ideas? FWIW running on RPI4.

System name for the core container is “localhost.docker.mean {name: homeassistant}”

A while back I had this and by trial and error resolved it (eventually) by removing Studio Code Server. Not sure if that is still and issue or not. YMMV. Good luck!

Thanks. I don’t have Studio Code Server so it’s not that for me, but hopefully might help others.

I have the same issue with it keeps freezing and randomly restarting… I run it rather lightly on a rpi4 with an ssd but I noticed that after upgrading to 2023.1 it started being slow, still troubleshooting …

Bit of an update. I dug more into glances data. From what I can tell - restarts are being triggered by updates across all my entities being triggered at once overwhelming the CPU (see below). At least that’s the best conclusion I can draw. Unfortunately, the current version of glances add-on doesn’t seem to provide access to the process-list output via the influx integration (processlist is a default part of the API but the API seems to not be available for me right now).

Any ideas on how I can stagger updates?

As a note, I had similar problems on my RPi4 (in my case, it was locking up under heavy load rather than rebooting) and it was 100% fixed by improving the CPU cooling. In my case, I was using an active fan controlled by HA based on CPU temperature, which I thought would be fine as I set the threshold very low, but in fact it seems that the RPi4 does weird things when not cooled properly, even when the CPU temperatures looked ok.

I switched to this geekworm aluminium case which is purely passive, but the design massively spreads the CPU cooling area across the whole case which I found dropped the CPU temperatures by 20C minimum, even without active cooling. I’ve not had a single unexplained lock-up in the 18 months since I swapped.

The only other thing would be power on the USB - do you have any peripherals plugged in? SSD? Zigbee? Coral? The RPi4 is notorious for having a poorly considered USB power architecture.

Thanks @daern - I had considered both of those things. It’s summer in Australia - and it’s been quite hot in the room where the RPI lives. Thanks for the recommendation on the case - I reckon that’s my next option.

As for peripherals - I’m running on a USB SSD (sandisk I think) so that could be contributing as well. Off tp the shops I go! :slight_smile:

It’s always a good day to buy more components for HA :slight_smile: There’s no question that it was worse for me in the summer, although it must be said that the small cupboard where it lives doesn’t have a lot of ventilation and, despite being in the UK, probably does a fair job of emulating an Australian summer. Some door vents are on the to do list…

I’ve got a powered USB hub for my SSD, Coral and Sonoff Zigbee stick and it seems to work well, even when using Frigate with 6x cameras on a lowly RPi4. I never experienced the USB power issues myself, but moved to the powered hub when I added the SSD as a precaution.

Oh man! Bit of a facepalm moment for me. After days of digging around trying to get processes from glance into influx (unsuccessful) and battling continual restarts (every 10 minutes) I finally discovered the cause by looking at the log (config/home-assistant.log.1). I know, I know, I know.

What I saw was that an automation I had built to count numbers of errors and warnings (part of a sort of quality metric) was itself causing errors. So one warning or error led to multiple warnings or errors. Meaning the whole thing was exponential and would eventually max out the CPU and cause a restart.

Anyway - shut those automations down and everything has been super-smooth since then.

In addition - along the way I found out that:

  • RPI4 is rated to 80 degrees C for the processor and 70 degrees for the network interface
  • Power draw for 4 CPUs maxed out is 6W, the SSD has a max power draw of 7W, the power supply can deliver 15W. So that was close but unlikely the cause.

Having said all that - I’m replacing the case with a fan-boosted passive cooling system, and the power supply with a beefier USB-C source.

Thanks for that, I installed Weatherlink in HACS, it wasn’t until recently that I added integrations and I did a few at the same time and added more from HACS, I never expected Weatherlink but checked logs and found this

2024-10-05 23:31:08.572 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration weatherlink which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant

I’ve now deleted Weatherling and now HA seems to be running OK, it was restarting every couple of minutes but has been running 30 minutes without an issue now.

That is a generic warning that is generated for all 3rd party integrations, if they are working correctly or not.