HA instability

I’m running rPi 4 8GB with SSD. Both are powered by separate/dedicated power supplies.
I’m a bit behind in regard of HAOS. It’s v6.6 probably causing all components/extensions to show an inability to show their logs. I’m reluctant to update because the system is configured to boot from the SSH. But maybe today it wouldn’t not be a problem.

I’ve been using this setup for about 4 years without any hiccups.
Since I updated HA to 2024.7 I started encountering various serious instability issues. At that time I also started using the Modbus integration.
Then updated to the latest 2024.8.

The issues are as follows

  • crashing sqlite database
  • sudden CPU temp ramp-up causing system to slow down and then HA crash/restart
  • Glances show critical CPU_IOWAIT as well as 100% memory and swap usage but not near the time the HA crashed

I have serious problems identifying the root cause.

I pulled SSD from HA to check its health. Connected to PC, the S.M.A.R.T confirms the disk is healthy (85%).

I found nothing specific in logs except mentioned Glances errors.
I’ve read about several recorder problems introduced in 2024.7 but found nothing similar to what I experience.

Recently I installed LTSS and TimescaleDB to workaround loosing long-term data. What is strange, that if the situation occurs, LTSS extension stops sending data to the postgres. The latter reports no errors, just the data are missing from DB. Since I know pg pretty well, I know they cannot be lost. So it looks like if the situation occurs, it breaks LTSS or its ability to collect data from HA. Only HA restart helps. It’s even more interesting since LTSS is a custom component so it integrates directly with the recorder.

Today the issue caused homeassistant and hassio_observer to restart as well as stop Z2M.

Is there any way to monitor docker containers performance from within HA?
How to cope with this issue?

BTW sudden stability of various metrics after the restart is more than interesting.


You most likely have a huge amount of breaking changes and database updates to deal with in a 4 year jump.
Today the grace period for core breaking changes are 6 months, but I think it was 3 months back then.
I can’t remember when it was changed, but my suggestion forward is to update in 3 months jumps and solve the breaking changes that occur and let the database updates run through.
If you can find the time when grace period was changed to 6 months, then you can go with 6 months jumps from there.

Before you do each update, then make a backup and move it to another device, like USB drive or network drive and then make sure that you have plenty of space free before updating.
If you are low on space, then download the backups and delete them from the HA storage to quickly gain space.

2 Likes

thanks for prompt response. Please note I’m on old operating system, not HA. I expect the database has nothing to do with OS, but I admit that so old OS potentially might be the the reason.

If you are running a supervised installation, then Debian 12 is required for newest version.

I’m running HAOS