HA freezes

Hi, this is now the 2nd time that Home Assistant simply freezes overnight.

System: Raspberry Pi 4 2GB RAM, connected with 240GB SanDisk SSD via Ugreen SATA hard drive enclosure and a Sonoff Zigbee 3.0 Dongle Plus (ZHA). Everything is directly connected to the Pi powered by the Official Power Supply. Previous version was 2024.4.3 (I have now updated to 2024.4.4).

What does freezing mean in my case?

  • Home Assistant no longer starts automations
  • Not accessible via app or locally
  • Doesn’t update sensor readings
  • However, still seems to be running in the background, LAN port LEDs blinking (And there has been activity in the host log, during the downtime)

The SSD shouldn’t be the issue, as it consistently draws 3-4W in idle with the SSD connected, with the majority being consumed by the Pi. And it has been running without any issues for over a month and a half.

I simply unplug it and then everything runs as it should again. Can’t find any error messages in the logs (besides the usual ones, there were no messages at the time of the crash anyway). The crash occurred at 05:03 AM (I can see this in the history of several sensors, value remains constant from that point until reboot)

As addons, only EspHome and OneDriveBackUp are running. I can’t access their logs anymore due to the restart. Backup is automatically done every night between 1-4 AM, so apparently it’s not responsible for the crash.

RAM usage shouldn’t be an issue either, as it’s always around 50% in normal operation (CPU at 1-4%). CPU temperature is constantly at about 35°C (95°F).

I really have no idea what’s causing this. Especially since everything works fine again for over a month afterwards.
I’ve read that some people do an automatic reboot once a week or once a month, but that can’t be a good solution either, can it?

I don’t think you could rule out powering the SSD as a possible issue. I read somewhere the USB for the RPI is not designed to handle large power distribution to the ports. Nor do I think the official adapter is designed to handle the load. A backup or database maintenance might require way more than normal operation. If the problem persists with a powered USB hub inbetween then I’d start searching in other places.

Also, there may be usb SSD incompatibilities that can be solved by an usb hub inhetween.

Well it’s strange that it worked for one and a half months. But I’ll give it a try with this adapter from Ugreen.

I had a similar experience before I switched to a powered USB hub for my SSD. It would randomly lock up - could go months between.

This is related probably to this “open secret” about RPI 4 with an SSD running HA and my permanent solution here which may help you.

I too see this on occasion. I had added the USB SSD to the Pi 4 almost a year ago, and it was running rock solid till about November last year. Then I would get the hangs where HA wasn’t totally dead, but in a zombie state. The only way to recover was to power cycle. And like the OP, nothing to be found in the logging after restart.

But after a reboot all of the data for all of the sensors etc would be missing for however long it was in that bad state.

Just a few days ago I think I might have caught it as things were starting to go bad, the most recent I was able grab some logging before it became completely unresponsive.

The interesting thing was that while it was in the going bad state, the various graphing were all showing the data. But after the reboot those several hours that were still being displayed went poof.

And in that case I was able to successfully do a soft reboot and it did not require a hard power cycle to recover.

The CPU was around 30% as I was capturing this. It is normally under 5% most of the time.

There were a very large number of log entries that I presume were mostly the same for the ‘traceback’ data. I was able to grab two of them and the traceback data was identical. Here is one of those log entries:

Logger: homeassistant.components.recorder.core
Source: components/recorder/core.py:900
integration: Recorder (documentation, issues)
First occurred: 8:27:43 AM (351 occurrences)
Last logged: 8:34:06 AM

Error while processing event <Event state_changed[L]: entity_id=sensor.12h_rain_statistics, old_state=<state sensor.12h_rain_statistics=0.0; state_class=measurement, buffer_usage_ratio=0.01, age_coverage_ratio=0.01, source_value_valid=True, unit_of_measurement=inches, icon=mdi:calculator, friendly_name=12h Rain Statistics @ 2024-04-24T08:23:00.499484-05:00>, new_state=<state sensor.12h_rain_statistics=0.0; state_class=measurement, buffer_usage_ratio=0.01, age_coverage_ratio=0.02, source_value_valid=True, unit_of_measurement=inches, icon=mdi:calculator, friendly_name=12h Rain Statistics @ 2024-04-24T08:23:00.499484-05:00>>:
Error while processing event CommitTask():
Error while processing event <Event state_changed[L]: entity_id=sensor.shop_illuminance_bh1750, old_state=<state sensor.shop_illuminance_bh1750=1369.1; state_class=measurement, unit_of_measurement=lx, device_class=illuminance, friendly_name=Shop Illuminance BH1750 @ 2024-04-24T08:33:04.606198-05:00>, new_state=<state sensor.shop_illuminance_bh1750=1375.8; state_class=measurement, unit_of_measurement=lx, device_class=illuminance, friendly_name=Shop Illuminance BH1750 @ 2024-04-24T08:34:04.610513-05:00>>:
Error while processing event <Event state_changed[L]: entity_id=sensor.ambient_temp_ds18b20, old_state=<state sensor.ambient_temp_ds18b20=37.8; state_class=measurement, unit_of_measurement=°F, device_class=temperature, icon=mdi:thermometer, friendly_name=Garden Ambient Temp @ 2024-04-24T08:33:06.160191-05:00>, new_state=<state sensor.ambient_temp_ds18b20=38.1; state_class=measurement, unit_of_measurement=°F, device_class=temperature, icon=mdi:thermometer, friendly_name=Garden Ambient Temp @ 2024-04-24T08:34:06.158031-05:00>>:
Error while processing event <Event state_changed[L]: entity_id=sensor.shallow_temp_ds18b20, old_state=<state sensor.shallow_temp_ds18b20=46.6; state_class=measurement, unit_of_measurement=°F, device_class=temperature, icon=mdi:thermometer, friendly_name=Garden Shallow Temp @ 2024-04-24T08:31:06.188730-05:00>, new_state=<state sensor.shallow_temp_ds18b20=46.4; state_class=measurement, unit_of_measurement=°F, device_class=temperature, icon=mdi:thermometer, friendly_name=Garden Shallow Temp @ 2024-04-24T08:34:06.193557-05:00>>:
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 900, in _guarded_process_one_task_or_event_or_recover
  File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 912, in _process_one_task_or_event_or_recover
  File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 1033, in _process_one_event
  File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 1126, in _process_state_changed_event_into_session
  File "/usr/src/homeassistant/homeassistant/components/recorder/table_managers/states_meta.py", line 58, in get
  File "/usr/src/homeassistant/homeassistant/components/recorder/table_managers/states_meta.py", line 108, in get_many
  File "/usr/src/homeassistant/homeassistant/components/recorder/util.py", line 230, in execute_stmt_lambda_element
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2306, in execute
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2181, in _execute_internal
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2050, in _connection_for_bind
  File "<string>", line 2, in _connection_for_bind
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 1144, in _connection_for_bind
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 3280, in connect
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 146, in __init__
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 3304, in raw_connection
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/pool/impl.py", line 447, in connect
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/pool/base.py", line 1263, in _checkout
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/pool/base.py", line 712, in checkout
  File "/usr/src/homeassistant/homeassistant/components/recorder/pool.py", line 78, in _do_get
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/pool/impl.py", line 429, in _do_get
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/pool/base.py", line 390, in _create_connection
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/pool/base.py", line 674, in __init__
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/pool/base.py", line 914, in __connect
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/event/attr.py", line 483, in _exec_w_sync_on_first_run
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/event/attr.py", line 497, in __call__
  File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 1391, in _setup_recorder_connection
AssertionError

My interpretation of those errors seems to indicate drive access problems for the recorder. Which is consistent with the all of the data while in the hung / zombie state not being recorded / missing after a reboot.

I really thought I was in the clear with the USB SSD drive gotcha after running that many months initially with zero problems. Perhaps not.

I wonder if it is a memory leak with the USB driver or RPI firmware - who knows! However when mine goes on vacation the CPU is at 0% and there’s nothing in the logs except one line of gobbledogook binary data… - which as I mentined is resolved by a preemptive soft reboot on the device through the automation shown (which indicates a memory leak I would think)…