HA gets nervous breakdown

In the last couple of days I had a very unusable behaviour of HA. I couldn’t connect to HA anymore. An existing connection was somwwhat usable. No access to supervisor anymore (I had to disconnect the PI from power to get everything going again); running Home Assistant OS, core-2021.10.5 on a PI4 with an external SSD. I thought that was a temporary thing until yesterday, when it happened again :-/ I reckon that mariadb may be the culprit but I don’t know.

First breakdown …

[Errno 5] I/O error: '/config/.storage/icloud/session/stefanwxxxxxxx'
4:12:50 – (ERROR) icloud3 (custom integration) - message first occurred at 15 October 2021, 22:33:10 and shows up 192772 times
Database connection invalidated: Error executing query: (MySQLdb._exceptions.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: https://sqlalche.me/e/14/e3q8). (retrying in 3 seconds)
4:12:50 – (ERROR) Recorder - message first occurred at 15 October 2021, 21:39:41 and shows up 5426 times
SQLAlchemyError error processing event <Event time_changed[L]: now=2021-10-16T04:12:32.002825+02:00>: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (MySQLdb._exceptions.OperationalError) (2002, "Can't connect to MySQL server on 'core-mariadb' (115)") (Background on this error at: https://sqlalche.me/e/14/e3q8) (Background on this error at: https://sqlalche.me/e/14/7s2a)
4:12:50 – (ERROR) Recorder - message first occurred at 15 October 2021, 21:39:47 and shows up 5424 times
Unhandled database error while processing event <Event time_changed[L]: now=2021-10-16T04:10:43.000667+02:00>: (MySQLdb._exceptions.OperationalError) (2002, "Can't connect to MySQL server on 'core-mariadb' (115)") (Background on this error at: https://sqlalche.me/e/14/e3q8)
4:12:43 – (ERROR) Recorder - message first occurred at 15 October 2021, 21:40:08 and shows up 788 times
Error executing query SELECT table_schema "database", Round(Sum(data_length + index_length) / 1024, 1) "value" FROM information_schema.tables WHERE table_schema="homeassistant" GROUP BY table_schema LIMIT 1;: (MySQLdb._exceptions.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: SELECT table_schema "database", Round(Sum(data_length + index_length) / 1024, 1) "value" FROM information_schema.tables WHERE table_schema="homeassistant" GROUP BY table_schema LIMIT 1;] (Background on this error at: https://sqlalche.me/e/14/e3q8)
4:12:34 – (ERROR) sql - message first occurred at 15 October 2021, 21:40:02 and shows up 786 times
Error executing statistics: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (MySQLdb._exceptions.OperationalError) (2002, "Can't connect to MySQL server on 'core-mariadb' (115)") (Background on this error at: https://sqlalche.me/e/14/e3q8)
4:12:06 – (WARNING) Recorder - message first occurred at 15 October 2021, 22:12:06 and shows up 8 times
Error executing query: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (MySQLdb.exceptions.OperationalError) (2002, "Can't connect to MySQL server on 'core-mariadb' (115)") (Background on this error at: https://sqlalche.me/e/14/e3q8)
4:12:06 – (ERROR) Recorder - message first occurred at 15 October 2021, 22:12:06 and shows up 35 times
Cannot connect to InfluxDB due to 'HTTPConnectionPool(host='a0d7b954-influxdb', port=8086): Max retries exceeded with url: /query?q=select+sum%28diskBytes%29+as+value+from+%22monitor%22.%22shard%22+where+time+%3E+now%28%29+-+10s&db=internal (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ed04703a0>: Failed to establish a new connection: [Errno 111] Connection refused'))'. Please check that the provided connection details (host, port, etc.) are correct and that your InfluxDB server is running and accessible.
4:11:54 – (ERROR) InfluxDB - message first occurred at 15 October 2021, 21:45:53 and shows up 387 times
Error handling request
4:09:40 – (ERROR) components/http/static.py - message first occurred at 15 October 2021, 23:42:25 and shows up 25 times
Client error on api app/entrypoint.js request Cannot connect to host 172.30.XXX.XXX:80 ssl:default [Connect call failed ('172.30.XXX.XXX', 80)]
4:09:40 – (ERROR) Home Assistant Supervisor - message first occurred at 4:04:53 and shows up 3 times
Error handling request
4:09:29 – (ERROR) components/recorder/util.py - message first occurred at 4:03:10 and shows up 24 times
Error executing query: (MySQLdb.exceptions.OperationalError) (2002, "Can't connect to MySQL server on 'core-mariadb' (115)") (Background on this error at: https://sqlalche.me/e/14/e3q8)
4:09:29 – (ERROR) Recorder - message first occurred at 4:03:10 and shows up 75 times
[544661422288] Error handling message: Unknown error
4:08:34 – (ERROR) Home Assistant WebSocket API
Error handling request
4:07:45 – (ERROR) components/logbook/init.py - message first occurred at 4:04:59 and shows up 2 times
Error doing job: Task exception was never retrieved
4:07:02 – (ERROR) helpers/storage.py - message first occurred at 15 October 2021, 21:52:02 and shows up 35 times
Failed to to call /network/info -
4:03:09 – (ERROR) Home Assistant Supervisor - message first occurred at 4:03:09 and shows up 9 times
Client error on /network/info request Cannot connect to host 172.30.XXX.XXX:80 ssl:default [Connect call failed ('172.30.XXX.XXX', 80)]
4:03:09 – (ERROR) Home Assistant Supervisor - message first occurred at 15 October 2021, 22:21:35 and shows up 17 times
Unhandled exception
4:00:09 – (ERROR) /usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py - message first occurred at 3:54:15 and shows up 43 times
Unhandled exception
4:00:08 – (ERROR) /usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py - message first occurred at 15 October 2021, 23:42:24 and shows up 17 times
Can't read last version:
3:51:35 – (WARNING) Home Assistant Supervisor - message first occurred at 15 October 2021, 22:21:35 and shows up 7 times
Error doing job: Task exception was never retrieved
2:52:17 – (ERROR) util/json.py - message first occurred at 15 October 2021, 22:52:17 and shows up 4 times
JSON file reading failed: /config/.storage/hacs.critical
2:52:17 – (ERROR) util/json.py - message first occurred at 15 October 2021, 22:52:17 and shows up 4 times
Filtered a request with a potential harmful query string: ///remote/fgt_lang?lang=/../../../..//////////dev/
1:35:21 – (WARNING) HTTP - message first occurred at 15 October 2021, 9:37:37 and shows up 3 times
Error while executing automation automation.advanced_heating_monitor: Error talking to MQTT: The client is not currently connected.
0:01:00 – (ERROR) Automation
Advanced Heating Monitor: Choose at step 1: choice 3: Error executing script. Error for call_service at pos 1: Error talking to MQTT: The client is not currently connected.
0:01:00 – (ERROR) Automation - message first occurred at 0:01:00 and shows up 2 times
Filtered a potential harmful request to: /cgi-bin/.%2e/.%2e/.%2e/.%2e/bin/sh
15 October 2021, 23:42:52 – (WARNING) HTTP - message first occurred at 14 October 2021, 15:17:11 and shows up 3 times
Error while executing automation automation.heating_forecast_2: Error talking to MQTT: The client is not currently connected.
15 October 2021, 23:30:00 – (ERROR) Automation
Advanced Heating Forecast: Error executing script. Error for call_service at pos 4: Error talking to MQTT: The client is not currently connected.
15 October 2021, 23:30:00 – (ERROR) Automation
Error on Supervisor API:
15 October 2021, 23:23:00 – (ERROR) Home Assistant Supervisor
Error while executing automation automation.heating_reset_offsets: Error talking to MQTT: The client is not currently connected.
15 October 2021, 23:05:00 – (ERROR) Automation
Advanced Heating Reset offsets: Error executing script. Error for call_service at pos 1: Error talking to MQTT: The client is not currently connected.
15 October 2021, 23:05:00 – (ERROR) Automation
Heating Reset Offsets: Error executing script. Error for call_service at pos 1: Error talking to MQTT: The client is not currently connected.
15 October 2021, 23:05:00 – (ERROR) Scripts
Timeout fetching motioneye data
15 October 2021, 22:56:54 – (ERROR) motionEye
Error while executing automation automation.advanced_heating_hc3: Error talking to MQTT: The client is not currently connected.
15 October 2021, 22:20:00 – (ERROR) Automation - message first occurred at 15 October 2021, 21:59:41 and shows up 2 times
Advanced Heating HC3: Choose at step 1: choice 1: Error executing script. Error for call_service at pos 1: Error talking to MQTT: The client is not currently connected.
15 October 2021, 22:20:00 – (ERROR) Automation - message first occurred at 15 October 2021, 21:59:41 and shows up 4 times
Error while executing automation automation.advanced_heating_hc2: Error talking to MQTT: The client is not currently connected.
15 October 2021, 22:20:00 – (ERROR) Automation - message first occurred at 15 October 2021, 21:59:41 and shows up 2 times
Advanced Heating HC2: Choose at step 1: choice 1: Error executing script. Error for call_service at pos 1: Error talking to MQTT: The client is not currently connected.
15 October 2021, 22:20:00 – (ERROR) Automation - message first occurred at 15 October 2021, 21:59:41 and shows up 4 times
Error while executing automation automation.advanced_heating: Error talking to MQTT: The client is not currently connected.
15 October 2021, 22:20:00 – (ERROR) Automation - message first occurred at 15 October 2021, 21:59:41 and shows up 2 times
Advanced Heating HC1: Choose at step 1: choice 1: Error executing script. Error for call_service at pos 1: Error talking to MQTT: The client is not currently connected.
15 October 2021, 22:20:00 – (ERROR) Automation - message first occurred at 15 October 2021, 21:59:41 and shows up 4 times
Disconnected from MQTT server core-mosquitto:1883 (1)
15 October 2021, 21:46:37 – (WARNING) MQTT
Cannot connect to InfluxDB due to 'b'{"error":"engine: error writing WAL entry: write /data/influxdb/wal/homeassistant/autogen/265/_00005.wal: read-only file system"}\n''. Please check that the provided connection details (host, port, etc.) are correct and that your InfluxDB server is running and accessible.
15 October 2021, 21:40:21 – (ERROR) InfluxDB
SQLAlchemyError error processing event <Event time_changed[L]: now=2021-10-15T21:38:51.001088+02:00>: Can't reconnect until invalid transaction is rolled back. (Background on this error at: https://sqlalche.me/e/14/8s2b)
15 October 2021, 21:39:44 – (ERROR) Recorder
HC2 greater HC1, delta is 9.0, sending offset -3 at 2021-10-15 14:28
15 October 2021, 18:39:33 – (CRITICAL) components/system_log/init.py - message first occurred at 13 October 2021, 20:52:10 and shows up 40 times
[544842998304] Client unable to keep up with pending messages. Stayed over 512 for 5 seconds
15 October 2021, 17:52:35 – (ERROR) Home Assistant WebSocket API
Fetch snapshot image failed from livingroom, falling back to FFmpeg; Unknown error: All connection attempts failed
15 October 2021, 16:51:13 – (ERROR) ONVIF - message first occurred at 15 October 2021, 16:11:49 and shows up 5 times
while parsing a block collection in "/config/sensors.yaml", line 5, column 1 expected , but found '?' in "/config/sensors.yaml", line 597, column 1
15 October 2021, 16:49:38 – (ERROR) util/yaml/loader.py
Timeout fetching d066563d1cf14e9892a1d572e6c1172b data
15 October 2021, 16:29:16 – (ERROR) AVM FRITZ!SmartHome
Login attempt or request with invalid authentication from tmo-097-251.customers.d1-online.com (80.187.97.251). (Mozilla/5.0 (iPhone; CPU iPhone OS 15_0_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Home Assistant/2021.8 (io.robbie.HomeAssistant; build:2021.216; iOS 15.0.2) Mobile/HomeAssistant, like Safari)
15 October 2021, 16:26:58 – (WARNING) HTTP - message first occurred at 14 October 2021, 15:56:05 and shows up 4 times
:0:0 Script error.
15 October 2021, 16:20:09 – (ERROR) components/system_log/init.py - message first occurred at 14 October 2021, 14:54:26 and shows up 4 times
Error handling request
15 October 2021, 16:19:41 – (ERROR) components/frontend/init.py - message first occurred at 14 October 2021, 7:59:26 and shows up 2 times

Second breakdown yesterday …

Database connection invalidated: Error executing query: (MySQLdb._exceptions.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: https://sqlalche.me/e/14/e3q8). (retrying in 3 seconds)
18:05:02 – (ERROR) Recorder - message first occurred at 17:31:52 and shows up 660 times
SQLAlchemyError error processing event <Event time_changed[L]: now=2021-10-23T18:01:28.001507+02:00>: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (MySQLdb._exceptions.OperationalError) (2002, "Can't connect to MySQL server on 'core-mariadb' (115)") (Background on this error at: https://sqlalche.me/e/14/e3q8) (Background on this error at: https://sqlalche.me/e/14/7s2a)
18:05:02 – (ERROR) Recorder - message first occurred at 17:31:58 and shows up 658 times
Unhandled exception
18:04:58 – (ERROR) /usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py - message first occurred at 18:04:58 and shows up 2 times
Client error on api app/entrypoint.js request Cannot connect to host 172.30.XXX.XXX:80 ssl:default [Connect call failed ('172.30.XXX.XXX', 80)]
18:04:58 – (ERROR) Home Assistant Supervisor
Error executing query SELECT table_schema "database", Round(Sum(data_length + index_length) / 1024, 1) "value" FROM information_schema.tables WHERE table_schema="homeassistant" GROUP BY table_schema LIMIT 1;: (MySQLdb._exceptions.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: SELECT table_schema "database", Round(Sum(data_length + index_length) / 1024, 1) "value" FROM information_schema.tables WHERE table_schema="homeassistant" GROUP BY table_schema LIMIT 1;] (Background on this error at: https://sqlalche.me/e/14/e3q8)
18:04:53 – (ERROR) sql - message first occurred at 17:31:53 and shows up 67 times
Error handling request
18:04:44 – (ERROR) components/recorder/util.py - message first occurred at 18:03:31 and shows up 10 times
Error executing query: (MySQLdb._exceptions.OperationalError) (2002, "Can't connect to MySQL server on 'core-mariadb' (115)") (Background on this error at: https://sqlalche.me/e/14/e3q8)
18:04:44 – (ERROR) Recorder - message first occurred at 17:36:24 and shows up 16 times
Error executing query: (MySQLdb._exceptions.OperationalError) (2002, "Can't connect to MySQL server on 'core-mariadb' (115)") (Background on this error at: https://sqlalche.me/e/14/e3q8)
18:04:44 – (ERROR) Recorder - message first occurred at 18:03:31 and shows up 30 times
Unhandled database error while processing event <Event time_changed[L]: now=2021-10-23T17:59:18.001231+02:00>: (MySQLdb._exceptions.OperationalError) (2002, "Can't connect to MySQL server on 'core-mariadb' (115)") (Background on this error at: https://sqlalche.me/e/14/e3q8)
18:04:41 – (ERROR) Recorder - message first occurred at 17:31:55 and shows up 62 times
Cannot connect to InfluxDB due to 'HTTPConnectionPool(host='a0d7b954-influxdb', port=8086): Max retries exceeded with url: /query?q=select+sum%28diskBytes%29+as+value+from+%22monitor%22.%22shard%22+where+time+%3E+now%28%29+-+10s&db=_internal (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f60479220>: Failed to establish a new connection: [Errno 111] Connection refused'))'. Please check that the provided connection details (host, port, etc.) are correct and that your InfluxDB server is running and accessible.
18:04:22 – (ERROR) InfluxDB - message first occurred at 17:31:37 and shows up 34 times
Error handling request
18:03:49 – (ERROR) components/http/static.py
Error executing statistics: (MySQLdb._exceptions.OperationalError) (2002, "Can't connect to MySQL server on 'core-mariadb' (115)") (Background on this error at: https://sqlalche.me/e/14/e3q8)
18:03:31 – (WARNING) Recorder - message first occurred at 17:36:24 and shows up 6 times
Error doing job: Task exception was never retrieved
18:03:31 – (ERROR) helpers/storage.py - message first occurred at 17:32:36 and shows up 7 times
Error doing job: Future exception was never retrieved
17:53:03 – (ERROR) components/icloud/account.py
Error doing job: Task exception was never retrieved
17:47:55 – (ERROR) util/json.py
JSON file reading failed: /config/.storage/hacs.critical
17:47:55 – (ERROR) util/json.py
Heating 1 offset 3 time 2021-10-23 17:40, source automation.advanced_heating_hc1
17:40:00 – (ERROR) System Log - message first occurred at 17:40:00 and shows up 2 times
Cannot connect to InfluxDB due to 'HTTPConnectionPool(host='a0d7b954-influxdb', port=8086): Max retries exceeded with url: /write?db=homeassistant (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8532d040>: Failed to establish a new connection: [Errno 111] Connection refused'))'. Please check that the provided connection details (host, port, etc.) are correct and that your InfluxDB server is running and accessible.
17:32:18 – (ERROR) InfluxDB
SQLAlchemyError error processing event <Event time_changed[L]: now=2021-10-23T17:30:47.001278+02:00>: Can't reconnect until invalid transaction is rolled back. (Background on this error at: https://sqlalche.me/e/14/8s2b)
17:31:55 – (ERROR) Recorder
Update of sensor.influxdb_db_size is taking over 10 seconds
17:31:32 – (WARNING) helpers/entity.py
[547352406576] Client unable to keep up with pending messages. Stayed over 512 for 5 seconds
17:07:25 – (ERROR) Home Assistant WebSocket API

It’s pretty clear from the error messages that home assistant is unable to communicate with your database.

As there are no issues open for MariaDB itself, it is likely something to do with your setup.

Is your Pi capable of supplying enough power to the SSD?

Is the SSD failing?

I use the stock PI power supply (5,1V / 3,0A) and I didn’t notice any failing of the SSD. How could I find out whether the SSD is failing? Can I run some sort of checkdisk?

There are errors related to InfluxDB and MQTT as well.
Seems like HA looses network connectivity at times…