Help needed: 2024.8 Core update unstable on RPi4, connection lost and crashes host, no issue on 2024.7

Looking for ideas because I can’t be the only one here whose system gets into a bad state with this new version.

Hardware:

  • Raspberry Pi 4 Model B
  • HAOS running on external SSD external storage
  • Power over Ethernet (well over a year, no issues)
  • External fan for extra cooling

I usually apply updates within a day or two of them releasing, same for 2024.8. I back up regularly so never really an issue to roll back if I need to wait for patches. I use quite a few custom UI components and a handful of custom integrations, but nothing new has been introduced for the past 6 months at least (aside from automations / etc) When I installed 2024.8 everything went fine at first. Then a few hours went by and I noticed I couldn’t connect to my host. It was unresponsive so I had to pull the power and plug it back in. Booted back up fine… At first. Note that the logs are extremely unhelpful here—all it can tell me is that “the database couldn’t verify that the system shut down was clean” with every crash.

After the 2nd boot, it only lasted an hour before ending up back in the “hang” state. It won’t reboot itself, just sits there and must be manually restarted to recover. Another reboot, another few hours go by… And it happens again.

I spent the entire weekend chasing ghosts trying to isolate the problem. I got so frustrated that I just nuked my 8gb database hoping that would help somehow. Nothing. Anywhere between 30 minutes to 6 hours will go by, and then the same crash. Every time. The only pattern I ever noticed was that sometimes the crash would appear to happen within a few minutes of interacting with one of my wall-mounted dashboards. But it wasn’t consistent. It would also happen sometimes when I had an echo device send a voice command to home assistant. But again, not a consistent repro.

Eventually I gave up after wasting an entire weekend, and rolled back to 2024.7.2. The system ran flawlessly the moment I rolled it back. No issues for multiple weeks. I figured maybe it was something that might be addressed by one of the incremental updates of 2024.8, so I waited until 2024.8.2 to give it another try. Again the installation went fine, but the same issue came right back.

At this point I don’t know what to do. I’ve tried …

  • a full clean install of home assistant OS
  • nuking my database
  • switching USB ports for the components hooked up to the raspberry pi including the SSD
  • shutting down a ton of the custom components (which is not a long-term solution, without those custom components home assistant is basically useless to me).

But anytime my system is running 20 24.8, it’s too unstable to rely on for any more than a couple of hours.

I know I’m not the only one hitting issues like this. But I can’t seem to find someone with the exact same issues echoing in the community forums. It seems like a lot of people are having issues with the database upgrade, and that didn’t seem to cause any problems for me. So I’m reaching out to see if anybody else can say that they’re avoiding the 2024.8 update because of these style of issue. I really want to be able to upgrade to the next version so I can take advantage of some of the updates that came with it, including things like Tessie which has some new functionality that I really want for my car.

I’m worried that I’m just going to be stuck on this core version forever, because the updates will take my system down. Is anyone else stuck??

You say ‘the logs are extremely unhelpful’

Which log. It restarts on HA restart. Are you looking at the log from before the restart or the one from the current run?you need the older one that actually shows what crashed.

Skip to troubleshooting. It will have a section about getting the pre crash logs out of a crashing system

Thanks. I’ll try this out over the coming weekend and see what I find.

I have a similar issue Pi4 120gb ssd

Seems to loose connection then go into reboot but after it says home assistant has started going to some pages either says cannot load or is blank.

Can get into backup page sometimes but trying to restore says cannot restore whilst rebooting

OK, I might finally have a lead. Do you use the Adaptive Lighting integration?

I had some time today so I upgraded to 2024.8.3. Same familiar crashes started happening. But I was able to get ahold of the logs leading up to 2 separate crashes tonight. The one common denominator I found was adaptive lighting.

Adaptive Lighting in my system is hitting TP-Link Kasa dimmer switches (unfortunately, early on mistake). I use the official integration, but I started using it when it was still in HACS. So I had some old automations hanging around to manage things like resetting manual control. And with 3 separate instances of adaptive lighting running (different areas / contexts in the home) I am wondering if maybe it just kept overloading the system.

It would explain why the system crashes would kick up in frequency after interacting with lights from the dashboard… maybe.

I disabled the integration completely after the reboot and I went from crashing every 5 minutes to being up and running without issue for at least 20 and counting. Fingers crossed…

2024-08-27 19:52:35.765 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [548167749488] Mobile from 192.168.0.17 (Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 Edg/128.0.0.0): Client unable to keep up with pending messages. Reached 4096 pending messages. The system's load is too high or an integration is misbehaving; Last message was: b'{"type":"event","event":{"c":{"automation.auto_adaptive_lights_apply_manual_control":{"+":{"lu":1724813555.7652128,"a":{"current":0}}}}},"id":3}'
2024-08-27 19:52:35.766 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [548167753376] Dashboard Tablets from 192.168.0.124 (Mozilla/5.0 (Linux; Android 9; KFTRPWI Build/PS7330.4104N; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/124.0.6367.248 Safari/537.36 Home Assistant/2024.7.3-13278 (Android 9; KFTRPWI)): Client unable to keep up with pending messages. Reached 4096 pending messages. The system's load is too high or an integration is misbehaving; Last message was: b'{"type":"event","event":{"c":{"automation.auto_adaptive_lights_apply_manual_control":{"+":{"lu":1724813555.7652128,"a":{"current":0}}}}},"id":2}'
2024-08-27 19:52:35.766 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [547938944912] Mobile from 192.168.0.17 (Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 Edg/128.0.0.0): Client unable to keep up with pending messages. Reached 4096 pending messages. The system's load is too high or an integration is misbehaving; Last message was: b'{"type":"event","event":{"c":{"automation.auto_adaptive_lights_apply_manual_control":{"+":{"lu":1724813555.7652128,"a":{"current":0}}}}},"id":3}'
2024-08-27 19:52:35.766 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [547374455040] Mobile from 192.168.0.17 (Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 Edg/128.0.0.0): Client unable to keep up with pending messages. Reached 4096 pending messages. The system's load is too high or an integration is misbehaving; Last message was: b'{"type":"event","event":{"c":{"automation.auto_adaptive_lights_apply_manual_control":{"+":{"lu":1724813555.7652128,"a":{"current":0}}}}},"id":3}'

Mysteriously, it is now stable, not done anything.

Man that drives me crazy not knowing why.

I am happy to report that I finally got out of the crash, at least it seems. Completely stable and no babysitting since I turned off adaptive lights. That was not possible before.

Sadly, its doing it again, I managed to get a backup of 2024.7 restored and that was stable, then foolishly tried 2024.8.0 and I have the same issues again

Mine crept back in too, but only after I re-enabled adaptive lights. But there’s a specific automation that is cited every time the crash happens. I’ve disabled that specific automation (instead of the entire integration again) and so far no issues…

I started using Adaptive Lights a long time ago, and I was using some of their “boiler plate” automations before they migrated more functionality to the UI. The automation which was popping up in my logs is no longer listed on the documentation for adaptive lighting. So my theory is that it was overloading my system. Why that only happens with 2024.8.* I have no idea…

To my knowledge I don’t have adaptive lights, would that appear in integrations?

Clearly, there is an integration somewhere causing it but only with 2024.8.0 onwards but I cant work out from the logs what it might be, Nest Protect, MQTT has Error lines about not generating unique IDs but ignoring, there are warnings but nothing that stands out.