So yesterday, I upgraded my Home assistant (a VM running HASS OS on proxmox) to 2024.8.1
Shortly after, I realised my disk had 0mb free.
Some digging showed my SQLite DB is approximately 50 GB, because I’d increased my recorder’s max days.
And it turns out as part of this update, there is some sort database maintenance operation going on, with this showing in the log
[homeassistant.components.recorder.migration] Rebuilding SQLite table states; This will take a while; Please be patient!
I ended up raising my home assistant’s drive from 80GB to almost 180GB to make this complete sucesssfully.
When all the dust had settled I found my home assistant sqlite DB was an eye watering 69.9G in size. At this point I decided it was not worth it to retain historical data (the data I really cared about like solar power, battery and apliance use was alread ybeing stored as long term statistics) so I commended out my purge_keep_days: 365 and restarted the HASS. I then went into developer tools and called recorder: puge data to remove all data older than 30 days, but I ran it with the Repack and Apply Filter options unticked.
That was approximately 5 hours ago.
I thought that was the end of it. However, disk I/O on my hass VM has just been going crazy these past few hours.
The log isn’t really showing anything useful beyond the usual messages. Is there some way I can see what is going on?
Edit – also, I’m watching my database size. The home-assistant_v2.db hasn’t moved in size at all in the last few hours, however the home-assistant_v2.db-wal has gone from 73 M to 1.2 G and is now sitting there.
Sorry can’t help but also struggling with this nightmare.
How did you know it completed? And you long did it take from “ [homeassistant.components.recorder.migration] Rebuilding SQLite table states; This will take a while; Please be patient!” to completion?
Ok, it is done now. Went for nearly 20 hours at an average speed which translates to almost 5.7 TB read.
This is self inflicted of course, because I decided to use recorder to keep a year (and in some cases two years) of data for some entities. My sqlite db was north of 50GB uncompressed in size.
However, it looks like what the upgrade was actually doing was converting this eye watering data to long term statistics?
I just kicked off a backup, and the resulting backup file is now down to 870 MB from 23 GB (these are compressed .tar file sizes, so actual size is probably larger)
I wish there was more indication of what it was actually doing rather than just having it hammer your disk non stop for nearly a day? I’m running an older enterprise SSD with a i7 for a CPU on this box and it was still so slow. Maybe this is a sign to upgrade to enterprise NVMe
I will kick off a repack and see what comes of it.
Edit, repack went VERY fast, about 10 minutes or so.
Before:
After:
As an unexpected benefit, none of my historical data has been lost as it appears it has all been converted into Long Term Statistics.
I did lose a day or so of recorded data while the system was performing maintenance on recorder in the DB - but overall a great result!
thanks, finally tracked this one down. my db is fairly large because i keep a lot, and i’ve never had an issue. i often read release notes but didn’t this time. i noticed space kept hitting 0 and there had to be some job running, i searched the log message and really only found this thread.
any future readers: the issue is simple, you need more space than is available because the migration needs to clone a copy of your db. my db x2 was just over my available space. this results in an endless loop and HA crashes/hard locks.
doing migrations on users production DBs should have a lot more care put into it. the most basic of checks would be checking available disk space so it doesn’t keep trying to run a job over and over that will never complete. for things that are so sensitive, they could also pop up a notification after the update that a db migration will be run.
they are treading this strange line of wanting to be user friendly, so i don’t see them alerting the user to a db migration, but at the same time you have issues like this that are not user friendly and won’t be figured out by basic users on their own.
There is an alert, but it is very easy to miss. It is only visible within the WebUI / App if you happen to log in. I think the vast majority of users just keep it at defaults, which is 14 days and that keeps the sqlite db managable. A DB that small would’ve converted over in a few minutes. You’re not supposed to use sqlite for much more heavy lifting than that.
For you and I though obviously we’ve gone overboard and our DBs were massive hence the problem. Anyway going forward this should no longer be an issue. As part of this db migration, they’ve converted that data to long term statistics which takes up a fraction of the space.