All,
Reaching out to the community here, as I am clueless to why my HA installation has become unstable.
It has worked flawlessly for a number of months since I changed to a new SSD.
I have the full configuration here
And no, no larger changes to config, only added further logging.
All this seems to have happend when I started to upgrade from 2022.12, but I cannot state when exactly it happened (I have only run manual updates, 2023.1.7 ↔ 2023.2.6 and so forth).
My current setup are the following:
- RPI 4, 8 MB.
- SSD via USB. HA OS boots from the SSD, not the SD-card.
- I have a RPI original power-supply (5.1V 3A) that should work to power both RPI and the SSD. I have an exact setup for a docker-server, that works flawlessly.
- HA OS 10.1.
- HA Core 2023.3.6.
- MariaDB 2.6.0 as add-on.
- HACS (Yes, I know, I know… not ideal when performing error-checking)
I have run the following commands, as indicated by the community:
ha su repair
ha core rebuild
ha host reboot
No errors when running, them, but still same problem after a while (minutes to hours, or full days).
It seems that when OS/Host is utilized, it becomes more unstable.
Such as doing an upgrade of MariaDB, performing a backup through GUI.
Once the problem occurs, I cannot through the GUI start File editor, Terminal, connect through SSH or connect to the Samba-share, I cannot also retrieve any logs from Host or Supervisor.
The GUI gradually degrades, first in performance.
The error message in the GUI is similar to:
Error when loading page
or
502: Bad gateway.
Hard-reboot makes the system boot up again. Done it now a number of times.
I get no error at startup, not in Core, Supervisor or Host.
When HA becomes unstable, I get some errors in Core-logs (obtained through GUI, the errors do not occur in the .1-log file).
Example from a few days ago:
13:14:11 – (FEL) Home Assistant Supervisor
/backups return code 500
13:14:11 – (FEL) Home Assistant Supervisor
Error executing query SELECT table_schema as "database", table_name as "table", Round(Sum(data_length + index_length) / 1024 / 1024, 1) as "value" FROM information_schema.tables WHERE table_schema="homeassistant" and table_name="statistics" LIMIT 1;: (MySQLdb.OperationalError) (2002, "Can't connect to MySQL server on 'core-mariadb' (115)") (Background on this error at: https://sqlalche.me/e/20/e3q8)
13:14:04 – (FEL) SQL - Meddelandet inträffade först 13:07:04 och har hänt 60 gånger
Update for sensor.home_assistant_backup_to_server1 fails
13:14:03 – (FEL) components/file/sensor.py - Meddelandet inträffade först 13:06:32 och har hänt 80 gånger
Unhandled database error while processing task KeepAliveTask(): (MySQLdb.OperationalError) (2002, "Can't connect to MySQL server on 'core-mariadb' (115)") (Background on this error at: https://sqlalche.me/e/20/e3q8)
13:13:48 – (FEL) Recorder - Meddelandet inträffade först 13:07:18 och har hänt 14 gånger
[Errno 5] I/O error: '/usr/local/lib/python3.10/site-packages/hass_frontend/frontend_latest/3c774a29.js'
13:13:48 – (FEL) components/http/static.py
Today:
19:13:12 – (FEL) Home Assistant Supervisor - Meddelandet inträffade först 19:10:36 och har hänt 4 gånger
Client error on /ingress/session request Cannot connect to host 172.30.32.2:80 ssl:default [Connect call failed ('172.30.32.2', 80)]
19:13:12 – (FEL) Home Assistant Supervisor - Meddelandet inträffade först 19:10:36 och har hänt 4 gånger
Client error on /store request Cannot connect to host 172.30.32.2:80 ssl:default [Connect call failed ('172.30.32.2', 80)]
19:13:09 – (FEL) Home Assistant Supervisor - Meddelandet inträffade först 19:02:44 och har hänt 30 gånger
Error doing job: Task exception was never retrieved
19:13:06 – (FEL) Easee EV Charger (anpassad integration) - Meddelandet inträffade först 18:59:06 och har hänt 15 gånger
Error doing job: Task exception was never retrieved
19:13:06 – (FEL) Easee EV Charger (anpassad integration) - Meddelandet inträffade först 19:03:06 och har hänt 2 gånger
Can't read Supervisor data:
19:12:50 – (VARNING) Home Assistant Supervisor - Meddelandet inträffade först 19:02:44 och har hänt 3 gånger
Updating sql sensor took longer than the scheduled update interval 0:00:30
19:12:49 – (VARNING) Sensor - Meddelandet inträffade först 18:59:19 och har hänt 112 gånger
Update for sensor.grafana_file_github_push_log fails
19:12:46 – (FEL) components/file/sensor.py - Meddelandet inträffade först 18:58:16 och har hänt 60 gånger
Update for sensor.home_assistant_backup_to_server1 fails
And another one from today:
[Errno 5] I/O error: '/usr/local/lib/python3.10/site-packages/hass_frontend/frontend_latest/922e036d.js'
20:06:01 – (FEL) components/http/static.py - Meddelandet inträffade först 20:05:47 och har hänt 9 gånger
Client error on api app/entrypoint.js request Cannot connect to host 172.30.32.2:80 ssl:default [Connect call failed ('172.30.32.2', 80)]
20:06:01 – (FEL) Home Assistant Supervisor - Meddelandet inträffade först 20:05:47 och har hänt 3 gånger
Can't read Supervisor data:
20:05:58 – (VARNING) Home Assistant Supervisor
Client error on /host/info request Cannot connect to host 172.30.32.2:80 ssl:default [Connect call failed ('172.30.32.2', 80)]
20:05:58 – (FEL) Home Assistant Supervisor - Meddelandet inträffade först 20:05:58 och har hänt 6 gånger
Update of sensor.home_assistant_table_size_states is taking over 10 seconds
20:05:47 – (VARNING) helpers/entity.py - Meddelandet inträffade först 20:05:47 och har hänt 4 gånger
Setup of sensor platform command_line is taking over 10 seconds.
19:21:13 – (VARNING) Sensor - Meddelandet inträffade först 19:21:13 och har hänt 3 gånger
Sensor sensor.balboa_spa_temperature_hour has device class temperature, state class None and unit °C thus indicating it has a numeric value; however, it has the non-numeric value: None (<class 'str'>); Please update your configuration if your entity is manually configured, otherwise create a bug report at https://github.com/home-assistant/core/issues?q=is%3Aopen+is%3Aissue+label%3A%22integration%3A+template%22
19:21:11 – (VARNING) Sensor
Ended unfinished session (id=439 from 2023-05-11 16:22:37)
19:20:54 – (VARNING) Recorder
To me it seems that somewhere for OS/Docker and docker-instances it goes wrong.
-
Why do I do not get any errors at startup?
Further logs to check? -
Do I have a corrupt OS/Docker?
If so, how to recover without a full restore. -
HW problems? (I have checked CPU temp, and it goes up from 47 degrees Celcius to roughly 58 when there problem occurs)
-
How can I check that there are no errors on the SSD? (other than connecting it to another linux-machine)?
I have regular backups, so I can restore, but, I do not want to restore all from scratch without knowing where the fault can be.
/Sven