So I was making regularly full ha backups in the past just to find out that these backups include corrupted databases which actually can’t be restored.
What happen? Well, recently my (sqlite) database was “corrupted” while ha was running. It created a new (empty) database and marked my old (over 4gb of valuable history) db as corrupted with a suffix.
I did the obviously and “just” restored the last full backup which looked promising (while ha was starting I actually could browse the energy history for example) but always when the backup restore is finished the database is again marked as corrupt and ha starts with a new. empty one.
So here I am wondering if it is still worth to do full ha backups or if their is a better/working way to backup the database so it doesn’t get corrupted?
Also any hope for my corrupted db to get working again? Every day counts now because I will loose the last week or two (the new db) which would be okay as long I get my history back.
I did already run sqlite3 ./home-assistant_v2.db ".recover" | sqlite3 ./home-assistant_v2.db.fix a few times which is somewhat successful (no error in the console) but it is still discard as corrupt when feeding it to ha
So after restoring all my backups back till December (one by one on a fresh system) I can tell the ha backups are useless.
Simply all full backups contain a sqlite database which seems to work while the restore is taking place (I’m able to browse all the energy history 2021 for example) but as time the restore is complete ha simply marks the restored sqlite database as corrupt and creates a new.
I could really vomit right now because I was planning deploying a solar system based on the data which is now completely lost
I use the mariadb add on and the the Google drive back up add on. I stop the DB prior to back up and this let’s you get a backup that is not corrupt.
This is mostly due to an open transaction while the backup was taken.
In most cases this can be repaired, by dumping and reimporting the database. Plenty of posts here how to do this. I never experienced any problems with this but there might be some inconsistencies.
Apart from that I remember reading here that the team is working on making the database backup more reliable.
That’s not fair. When used properly, Sqlite is perfectly robust and can be used for mission-critical data. Sure, other options can be more powerful — if you want to be DBA. Otherwise, you’re adding complexity and probably not actually getting much real benefit (especially if you’re running it on the same system). At best, you’re moving the backup problem to a new server.
The best way to safely back up sqlite databases I’m aware of — but let me state right away that I’m definitely not a DBA! — is to run VACUUM INTO before you do backups, and back up that file instead of the main one. (Or, disk space being cheap… in addition to it.)
Hmm; in fact, this has me considering moving my home-assistant_v2.db to an in-memory database, and using vacuum-into to dump that periodically.
You mimd sharing a link? Linke described above I was already runing sqlite3 .recover many times and at the very moment pragma checks are running already for hours
I tried to restore all the backups on two different machines and 4 different disks in total. Result is always the same and resulting in a “discard” db once restore is finish despite showing proper history data while the restore takes place.
No one uses SQLite for mission critical data. That would be insane.
Either way, I would assume that most people who have dealt with HA backups, or HA in general, have seen DB corruption. While it might be able to be fixed, for the most folks the data just doesn’t really matter that much. I assume that’s why the devs haven’t gone to something more robust.
That’s quite the confident assertion, but you are wrong, both in whether it happens and whether it is “sane”. I’m not going to escalate on the name calling — I think it’s inappropriate.
But I will say: for every case where I’ve seen someone using Sqlite where a different database server would be more appropriate, I’ve seen a hundred cases where someone has installed mysql, mariadb, or postgresql with minimal configuration and zero administrative attention or care, and those are at least as troubling. (Especially but not only on the topic at hand — reliable backups.)
Apologies if you took that as a personal attack. It wasn’t.
No normal IT person is going to run mission critical data on SQLite. If they do, they won’t be in IT for long. If you disagree, then we will just have to leave it there. I’ve no need to convince you and it’s probably not that constructive for this thread anyway.
I didn’t take it as a personal attack, but I don’t think it’s constructive to call people “insane”. And your other assertions continue to just be factually wrong.
To stay to the topic, if you install MariaDB, you will still need to do something about backups other than just back up or snapshot the filesystem where the on-disk database files live, or you will have similar problems.
SQLite is the most widely used database engine around, and it is often and correctly used for mission critical data. I think you’d be amazed how many products and devices are running a SQLite backend. For example, SQLite is built into the Android SDK so almost all data storage in Android is in SQLite DBs.
The real problem is that HA absolutely abuses SQLite and every other database engine by storing every state change as VARCHAR(255) with an attributes field as TEXT (seriously?!) containing a JSON string of all state attributes. This is why HA databases bloat to multiple gigabytes when storing 25MB worth of data. Until this is fixed, the database will always be a ticking time bomb that the user must actively manage. Combine this with SD cards and the problem is amplified 100x.
The HA backup (I am assuming here, as I don’t use Supervised/HASIO/HAS OS, etc.) makes useless backups of the SQLite DB because it doesn’t take a snapshot. It just copies the file. Leaving transactions open (does HA even use transactions? I don’t think so), commits open, and half written data.
I fully agree with your statement though with a minor edit: No one should use the Home Assistant database to store data you actually want. That would be insane.
CREATE INDEX ix_statistics_short_term_metadata_id ON statistics_short_term (metadata_id);
CREATE INDEX ix_statistics_short_term_start ON statistics_short_term (start);
CREATE UNIQUE INDEX ix_statistics_statistic_id_start ON statistics (metadata_id, start);
CREATE UNIQUE INDEX ix_statistics_short_term_statistic_id_start ON statistics_short_term (metadata_id, start);
ROLLBACK; -- due to errors
I was running the .recover before (non interactive/verbose) couple of times already and it actually spits out a new db file which is smaller than the original but still discard by ha once restore is complete.
I even did try to recover a recovered db (and did one more recover on that), the db file get’s smaller and smaller but ha still doesn’t want that…
When running .recover in the interactive prompt the last 5 lines are:
CREATE INDEX ix_statistics_short_term_start ON statistics_short_term (start);
CREATE UNIQUE INDEX ix_statistics_statistic_id_start ON statistics (metadata_id, start);
CREATE UNIQUE INDEX ix_statistics_short_term_statistic_id_start ON statistics_short_term (metadata_id, start);
PRAGMA writable_schema = off;
COMMIT;
I also did some PRAGMA checks but don’t have an idea what the output means:
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
sqlite> PRAGMA quick_check;
*** in database main ***
Page 729453: free space corruption
sqlite> PRAGMA integrity_check;
*** in database main ***
Page 729453: free space corruption
row 733 missing from index ix_events_event_type_time_fired
row 733 missing from index ix_events_context_id
row 733 missing from index ix_events_time_fired
database disk image is malformed
sqlite> PRAGMA foreign_key_check;
states|152065729|events|1
It’s also the reason why history accesses are so unbelievably slow. The entire recorder component will have to be rewritten from scratch at some point.
+1 for swapping to the Maria DB. I also don’t include the DB in my backups, since it’s around 5.1GB. Since swapping to the Maria DB I haven’t had any issues at all.
Except development is too busy creating a media player for some reason.
(I hope anyone doesn’t take that as more than it was meant to be…slight ribbing. I appreciate everything the HA developers do with the exception of the database/recorder component).