Home Assistant Crashing/Locking up

Hello,
Sorry if this is in the wrong place.

Home Assistant Yellow PoE
Pi4 8GB
Samsung NVME SSD 980 500GB
Zooz 800LR Z-Wave GPIO Module

Home Assistant 2023.8.2
Supervisor 2023.08.1
Operating System 10.4
Frontend 20230802.0 - latest

I’ve had this happen twice in the past three weeks where Home Assistant will “lock up” and mostly go unresponsive.

I’ll be able to check my main dashboard but most entities go into unavailable status or unable to connect, automations stop functioning, and Z-Wave devices stop communicating altogether.

It’s very hard to get logs when this happens as any disk read/writes seem to stop, and trying to load the settings menu gives a “error while loading page” message, but I was able to pull some of the most recent logs from the iOS app by copying and pasting. (Attempting to download the full log, or load the full log, gave an error message of 502: Bad Gateway…)

A reboot fully fixes it, but it’s a problem when it randomly happens.

AuthMetadataPluginCallback "<google.auth.transport.grpc.AuthMetadataPlugin object at 0x7f5248e990>" raised exception!
10:54:23 PM – (ERROR) /usr/local/lib/python3.11/site-packages/grpc/_plugin_wrapping.py - message first occurred at 8:42:44 PM and shows up 1568 times
Error in database connectivity during commit: Error executing query: (sqlite3.OperationalError) disk I/O error (Background on this error at: https://sqlalche.me/e/20/e3q8). (retrying in 3 seconds)
10:54:23 PM – (ERROR) Recorder - message first occurred at 8:31:55 PM and shows up 649 times
SQLAlchemyError error processing task CommitTask(): Can't reconnect until invalid transaction is rolled back. Please rollback() fully before proceeding (Background on this error at: https://sqlalche.me/e/20/8s2b)
10:54:21 PM – (ERROR) Recorder - message first occurred at 8:31:58 PM and shows up 648 times
Websocket connection error: Cannot connect to host ratosvminion.local:7125 ssl:default [Try again]
10:54:20 PM – (ERROR) runner.py - message first occurred at August 12, 2023 at 11:43:28 AM and shows up 6944 times
Connection problem to snitun server
10:54:19 PM – (ERROR) runner.py - message first occurred at 8:33:38 PM and shows up 646 times
connection to moonraker down, restarting
10:54:15 PM – (WARNING) Moonraker (custom integration) - message first occurred at August 12, 2023 at 1:07:26 PM and shows up 6877 times
[55R646(192.168.1.37):8009] Heartbeat timeout, resetting connection
10:54:11 PM – (WARNING) /usr/local/lib/python3.11/site-packages/pychromecast/socket_client.py - message first occurred at August 12, 2023 at 11:45:39 AM and shows up 1078 times
Failed to update data: [Errno -3] Try again
10:54:00 PM – (ERROR) Sense - message first occurred at 8:33:59 PM and shows up 132 times
Client error on /os/info request Cannot connect to host 172.30.32.2:80 ssl:default [Connect call failed ('172.30.32.2', 80)]
10:53:35 PM – (ERROR) Home Assistant Supervisor - message first occurred at 8:29:03 PM and shows up 456 times
[Errno 5] I/O error: '/usr/local/lib/python3.11/site-packages/hass_frontend/frontend_latest/37168-Kd71XHFPV4g.js'
10:53:15 PM – (ERROR) components/http/static.py - message first occurred at 10:45:09 PM and shows up 21 times
Could not connect to Plex server: LServer (HTTPSConnectionPool(host='plex.tv', port=443): Max retries exceeded with url: /api/resources?includeHttps=1&includeRelay=1 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f41960950>: Failed to establish a new connection: [Errno -3] Try again')))
10:52:50 PM – (ERROR) Plex Media Server - message first occurred at 8:33:37 PM and shows up 18 times
Error doing job: Task exception was never retrieved
10:51:32 PM – (ERROR) helpers/storage.py - message first occurred at 8:32:09 PM and shows up 17 times
Exception in _handle_event when dispatching 'alexa_media_m************e@g****': ({'player_state': {'destinationUserId': '**REDACTED**', 'dopplerId': {'deviceType': ''**REDACTED**', 'deviceSerialNumber': ''**REDACTED**'}, 'bass': 0, 'midrange': 0, 'treble': 0}},) Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/aiohttp/connector.py", line 1155, in _create_direct_connection File "/usr/local/lib/python3.11/site-packages/aiohttp/connector.py", line 874, in _resolve_host File "/usr/local/lib/python3.11/site-packages/aiohttp/resolver.py", line 33, in resolve File "/usr/local/lib/python3.11/asyncio/base_events.py", line 867, in getaddrinfo return await self.run_in_executor( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo socket.gaierror: [Errno -3] Try again The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/backoff/_async.py", line 151, in retry File "/usr/local/lib/python3.11/site-packages/alexapy/alexaapi.py", line 151, in _request File "/usr/local/lib/python3.11/site-packages/alexapy/alexalogin.py", line 862, in refresh_access_token File "/usr/local/lib/python3.11/site-packages/aiohttp/client.py", line 536, in _request File "/usr/local/lib/python3.11/site-packages/aiohttp/connector.py", line 540, in connect File "/usr/local/lib/python3.11/site-packages/aiohttp/connector.py", line 901, in _create_connection File "/usr/local/lib/python3.11/site-packages/aiohttp/connector.py", line 1169, in _create_direct_connection aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host api.amazon.com:443 ssl:default [Try again] The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/config/custom_components/alexa_media/media_player.py", line 461, in _handle_event File "/config/custom_components/alexa_media/media_player.py", line 333, in _refresh_if_no_audiopush File "/config/custom_components/alexa_media/helpers.py", line 156, in _catch_login_errors File "/config/custom_components/alexa_media/media_player.py", line 915, in async_update File "/config/custom_components/alexa_media/helpers.py", line 156, in _catch_login_errors File "/config/custom_components/alexa_media/media_player.py", line 625, in refresh File "/usr/local/lib/python3.11/site-packages/alexapy/helpers.py", line 147, in wrapper alexapy.errors.AlexapyConnectionError
10:50:11 PM – (ERROR) util/logging.py - message first occurred at 8:35:49 PM and shows up 31 times
alexaapi.get_state((<alexapy.alexaapi.AlexaAPI object at 0x7f595702d0>,), {}): A connection error occurred: An exception of type ClientConnectorError occurred. Arguments: (ConnectionKey(host='api.amazon.com', port=443, is_ssl=True, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=-'**REDACTED**), gaierror(-3, 'Try again'))
10:50:11 PM – (WARNING) Alexa Media Player (custom integration) - message first occurred at 8:35:49 PM and shows up 31 times
Giving up _request(...) after 5 tries (aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host alexa.amazon.com:443 ssl:<ssl.SSLContext object at 0x7f627f2b10> [Try again])
10:50:11 PM – (ERROR) Alexa Media Player (custom integration) - message first occurred at 8:35:49 PM and shows up 31 times
Error executing compile statistics: (sqlite3.OperationalError) disk I/O error (Background on this error at: https://sqlalche.me/e/20/e3q8)
10:50:10 PM – (WARNING) Recorder - message first occurred at 8:35:10 PM and shows up 25 times
Error executing query: (sqlite3.OperationalError) disk I/O error (Background on this error at: https://sqlalche.me/e/20/e3q8)
10:50:10 PM – (ERROR) Recorder - message first occurred at 8:35:10 PM and shows up 25 times
Can't read Supervisor data: 
10:49:53 PM – (WARNING) Home Assistant Supervisor - message first occurred at 8:33:34 PM and shows up 28 times
[55R646(192.168.1.37):8009] Failed to connect to service ServiceInfo(type='mdns', data='Smart-TV-Pro-'**REDACTED**._googlecast._tcp.local.'), retrying in 5.0s
10:49:37 PM – (ERROR) /usr/local/lib/python3.11/site-packages/pychromecast/socket_client.py - message first occurred at August 12, 2023 at 11:54:26 AM and shows up 235 times
[55R646(192.168.1.37):8009] Error communicating with socket, resetting connection
10:49:37 PM – (WARNING) /usr/local/lib/python3.11/site-packages/pychromecast/socket_client.py - message first occurred at August 12, 2023 at 11:54:26 AM and shows up 151 times
[55R646(192.168.1.37):8009] Error reading from socket.
10:49:37 PM – (ERROR) /usr/local/lib/python3.11/site-packages/pychromecast/socket_client.py - message first occurred at August 12, 2023 at 11:54:26 AM and shows up 151 times
alexaapi.get_bluetooth((<alexapy.alexalogin.AlexaLogin object at 0x7f64ab9050>,), {}): A connection error occurred: An exception of type ClientConnectorError occurred. Arguments: (ConnectionKey(host='alexa.amazon.com', port=443, is_ssl=True, ssl=<ssl.SSLContext object at 0x7f627f2b10>, proxy=None, proxy_auth=None, proxy_headers_hash=-'**REDACTED**), gaierror(-3, 'Try again'))
10:48:26 PM – (WARNING) runner.py - message first occurred at 8:42:28 PM and shows up 52 times
Unhandled exception
10:45:31 PM – (ERROR) /usr/local/lib/python3.11/site-packages/aiohttp/web_protocol.py - message first occurred at 10:45:26 PM and shows up 10 times
Failed to to call /backups - 
10:45:15 PM – (ERROR) Home Assistant Supervisor - message first occurred at 10:45:15 PM and shows up 3 times
Client error on /backups request Cannot connect to host 172.30.32.2:80 ssl:default [Connect call failed ('172.30.32.2', 80)]
10:45:15 PM – (ERROR) Home Assistant Supervisor - message first occurred at 10:45:15 PM and shows up 3 times
Can't refresh cloud token: 
10:24:42 PM – (ERROR) runner.py - message first occurred at 9:25:55 PM and shows up 2 times
Referenced entities switch.in_wall_paddle_switch_qfsw_500s_4 are missing or not currently available
10:23:41 PM – (WARNING) helpers/service.py - message first occurred at 10:00:00 PM and shows up 7 times
Access to https://aa015h6buqvih86i1.api.met.no/weatherapi/locationforecast/2.0/complete returned error 'ClientConnectorError'
10:15:46 PM – (ERROR) components/met/__init__.py - message first occurred at 9:19:41 PM and shows up 2 times
Error fetching flume data: Error communicating with flume API: HTTPSConnectionPool(host='api.flumetech.com', port=443): Max retries exceeded with url: /users/31296/notifications?limit=50&offset=0&sort_direction=ASC&read=true (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f41ad1710>: Failed to establish a new connection: [Errno -3] Try again'))
9:43:36 PM – (ERROR) Flume - message first occurred at 8:33:39 PM and shows up 3 times
Error fetching met data: Update failed: 
9:19:41 PM – (ERROR) Meteorologisk institutt (Met.no)
Giving up _static_request(...) after 5 tries (aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host alexa.amazon.com:443 ssl:<ssl.SSLContext object at 0x7f627f2b10> [Try again])
8:53:02 PM – (ERROR) runner.py - message first occurred at 8:42:28 PM and shows up 8 times
Error requesting homeassistant_alerts data: Cannot connect to host alerts.home-assistant.io:443 ssl:default [Try again]
8:43:49 PM – (ERROR) Home Assistant Alerts
Error fetching alexa_media data: Error communicating with API: 
8:42:28 PM – (ERROR) Alexa Media Player (custom integration)
Error requesting Sense Trends m**REDACTED**@gmail.com data: Cannot connect to host api.sense.com:443 ssl:default [Try again]
8:37:07 PM – (ERROR) Sense
Error fetching hassio data: Error on Supervisor API: 
8:37:01 PM – (ERROR) Home Assistant Supervisor
Error requesting NWS forecast station KCAK data: 500, message='Internal Server Error', url=URL('https://api.weather.gov/gridpoints/CLE/93,35/forecast')
8:36:37 PM – (ERROR) National Weather Service (NWS) - message first occurred at August 12, 2023 at 3:53:32 PM and shows up 5 times
Timeout sending report to Alexa for switch.tasmotaplug13
8:33:44 PM – (ERROR) Amazon Alexa
Disconnected from MQTT server core-mosquitto:1883 (16)
8:33:39 PM – (WARNING) MQTT - message first occurred at 8:33:39 PM and shows up 2 times
Ping fails, no response from peer
8:33:25 PM – (ERROR) runner.py
Error fetching NWS Alerts data: Cannot connect to host api.weather.gov:443 ssl:default [Try again]
8:33:02 PM – (ERROR) NWS Alerts (custom integration) - message first occurred at 8:33:01 PM and shows up 2 times
No ACK from MQTT server in 10 seconds (mid: 82)
8:32:35 PM – (WARNING) MQTT
Could not fetch info for cebe7a76_hassio_google_drive_backup: 
8:31:58 PM – (WARNING) Home Assistant Supervisor - message first occurred at 8:31:58 PM and shows up 13 times
Config entry 'Z-Wave JS' for zwave_js integration not ready yet: Failed to get the Z-Wave JS add-on info: ; Retrying in background
8:31:58 PM – (WARNING) config_entries.py
Request #0 to BlueIris (http://192.168.1.3:81) failed, Data: {'cmd': 'camlist', 'session': ''**REDACTED**'}, Response: {'result': 'fail', 'session': ''**REDACTED**'}
8:31:57 PM – (WARNING) Blue Iris NVR (custom integration) - message first occurred at 8:31:55 PM and shows up 2 times
Could not fetch changelog for cebe7a76_hassio_google_drive_backup: 
8:31:56 PM – (WARNING) Home Assistant Supervisor - message first occurred at 8:31:55 PM and shows up 13 times
/addons/cebe7a76_hassio_google_drive_backup/changelog return code 500
8:31:56 PM – (ERROR) Home Assistant Supervisor - message first occurred at 8:31:55 PM and shows up 13 times
Error doing job: Task exception was never retrieved
8:31:55 PM – (ERROR) components/zwave_js/update.py
Error writing config for core.restore_state: [Errno 30] Read-only file system
8:31:55 PM – (ERROR) helpers/storage.py
File replacement cleanup failed for /config/.storage/tmpwx6lrmya while saving /config/.storage/core.restore_state: [Errno 30] Read-only file system: '/config/.storage/tmpwx6lrmya'
8:31:55 PM – (ERROR) util/file.py
Saving file failed: /config/.storage/core.restore_state
8:31:55 PM – (ERROR) util/file.py

Check your PoE supply (coming from router/switch), make sure your not near or over the amount available. Try a standard power brick as a test to see if stops.
Power is used up very fast if there are APs, Cameras, etc drawing on the supply.
Also becareful of which PoE supply is being used, see: https://novotech.com/learn/m2m-blog/blog/2023/05/01/understanding-the-differences-between-poe-vs-poe-vs-poe/

Good thought. I actually had it on a 802.3bt PoE++ switch and moved it over to a separate switch. Few days after it first happened, but it still happened again a few weeks later.

It’s currently on a UniFi 750w 48 port PoE+ switch. Diagnostics on it shows that I’m not remotely close to max load for the port, and max load on the switch never crosses 400w. Yellow is on a 6 ft Cat6A that I ran through a tester with no issues.

I had this issue happen years ago with it running on just a Pi4 with an SD card and had to totally reflash it to make it stable again. I do have a large historical database (over 2gb last I checked….which is annoying when I have to do a hard reboot as it takes forever for Home Assistant to get going ) but I don’t think that would cause this to happen.