I’m running rPi 4 8GB with SSD. Both are powered by separate/dedicated power supplies.
I’m a bit behind in regard of HAOS. It’s v6.6 probably causing all components/extensions to show an inability to show their logs. I’m reluctant to update because the system is configured to boot from the SSH. But maybe today it wouldn’t not be a problem.
I’ve been using this setup for about 4 years without any hiccups.
Since I updated HA to 2024.7 I started encountering various serious instability issues. At that time I also started using the Modbus integration.
Then updated to the latest 2024.8.
The issues are as follows
- crashing sqlite database
- sudden CPU temp ramp-up causing system to slow down and then HA crash/restart
- Glances show critical CPU_IOWAIT as well as 100% memory and swap usage but not near the time the HA crashed
I have serious problems identifying the root cause.
I pulled SSD from HA to check its health. Connected to PC, the S.M.A.R.T confirms the disk is healthy (85%).
I found nothing specific in logs except mentioned Glances errors.
I’ve read about several recorder problems introduced in 2024.7 but found nothing similar to what I experience.
Recently I installed LTSS and TimescaleDB to workaround loosing long-term data. What is strange, that if the situation occurs, LTSS extension stops sending data to the postgres. The latter reports no errors, just the data are missing from DB. Since I know pg pretty well, I know they cannot be lost. So it looks like if the situation occurs, it breaks LTSS or its ability to collect data from HA. Only HA restart helps. It’s even more interesting since LTSS is a custom component so it integrates directly with the recorder.
Today the issue caused homeassistant
and hassio_observer
to restart as well as stop Z2M.
Is there any way to monitor docker containers performance from within HA?
How to cope with this issue?
BTW sudden stability of various metrics after the restart is more than interesting.