Fix corrupted / malformed SQLite database (home-assistant_v2.db)

e-raser · December 19, 2020, 6:27pm

I restored a snapshot when switching hardware / sd card (HASS OS on Pi 2 B 32 bit to HASS OS on Pi 3 B+ 64 bit).

Result:

The home-assistant_v2.db from a restored snapshot is malformed or corrupted. This leads so error messages and not functional Home Assistant (history graphs not generated/shown, sensor state changes aren´t stored etc.).

I tested it several times, in every test case the SQLite database got corrupted. Either during backup (creating the snapshot) or restore process.

sqlalchemy.exc.DatabaseError: (sqlite3.DatabaseError) database disk image is malformed
2020-12-19 16:08:43 ERROR (SyncWorker_6) [homeassistant.components.recorder.util] Error executing query: (sqlite3.DatabaseError) database disk image is malformed
(Background on this error at: http://sqlalche.me/e/13/4xp6)

As I was searching the HA community I quickly discovered this is a common issue, there are lots of topics related to this problem, seems to be a MASSIVE one.

The “solution” (in my opinion a workaround, at best) proposed in almost all those topics: delete the database, start from scratch. I was really wondering how everyone can accept the “just delete that damn sqlite db and your good to go!” workaround so all history data is gone.

For me personal it´s just not acceptable to loose all my history data every now and then. Especially when you´re not yet into putting medium and long-term data in e. g. InfluxDB. You never know when you´ll need to restore a snapshot, that´s what backups are for: being prepared for the worst case. So currently that´s not a reliable backup and restore progress at all.

So I decided to go the hard, not that quick way to keep my database (took me 10 to 15 minutes, saving many days of data):

I ran a “pragma integrity_check;” on it and several findings were returned:
grafik

When I take a look at the home-assistant_v2.db from the running system before the snapshot, the integrity check passes (I had access to the original SD card):
grafik

Assumption: either the snapshot creation process is faulty or the restore snapshot process harms the database. It´s very unlikely it´s a storage issue (“corrupt SD cards”), especially cause in my case they´re brand new.

In theory there should basically be three ways to fix this issue WITHOUT loosing all data (no need to create a new database):

Manually backup the home-assistant-v2.db next to the snapshot. Maybe automate it. Use that one for restore purposes.
Perform a “pragma integrity_check;” on the corrupted database and fix those issues.
(see https://serverfault.com/questions/8048/how-can-i-verify-that-a-sqlite-db3-file-is-valid-consistent)
Re-create the database while keeping its data:
(source: https://zngguvnf.org/2019-12-17--homeassistant-backup-and-restore.html - see also direct post links DB image malformed. How to fix it? and DB image malformed. How to fix it? for commands on a running system)
Copy your (corrupt) database (home-assistant_v2.db) to your local linux system and do:
“sqlite3 home-assistant_v2.db “.dump” | sqlite home-assistant_v2.db_fixed”
Now delete your old (corrupt database) and replace it with home-assistant_v2.db_fixed. To do that delete the corrupt one and rename home-assistant_v2.db_fixed to home-assistant_v2.db.

I personally succeeded in using option #1 in an advanced version: luckily I had access to the original disk (SD card) so I just grabbed the database, performed an integrity check (passed) and put that database back in place, restarted HA and everything´s fine again, meaning: all recorder, history and logbook data is there.

— Some developers could take a look at it (why is the database corrupted that often?) and especially check the snapshot creation and restore progress. —

Update 2021-01-02:
Added two direct post links (DB image malformed. How to fix it? and DB image malformed. How to fix it?) in option #3 to give a practical implementation guideline of doing this on the/a live system.

eddriesen · January 2, 2021, 10:41pm

@Eraser
Wow! I respect and salute your dauntless deep dive into the matter.

Short fix, not to go trough this hassle is to install mariadb or mysql and be done with it (the hassio supervisor has it as addon)

mariadb addon config

databases:
  - homeassistant
logins:
  - username: homeassistant
    password: StOeMeLlEnDgHaFrEtMiJnDbOp!
rights:
  - username: homeassistant
    database: homeassistant

config yaml

recorder:
  db_url: mysql://homeassistant:StOeMeLlEnDgHaFrEtMiJnDbOp!@core-mariadb/homeassistant?charset=utf8

et voila!
Bonus; much faster load time for history

maxym · January 2, 2021, 11:11pm

The root-cause behind this issue is likely a way how HA supervisor performs the backup. It just copy files incl. db file while HA is running storing data into that. It ends with data inconsistency aka malformed database file.

You succeeded to use db from your old card because this instance of HA had been properly stopped, and no process were modifying data in this database at time you were copying this file out.
I suppose you can succeed with creating 100% working backup by stopping HA first, then creating backup.

Just reminder: no database should be backed up this way (copying of its files) during database operations. Even transactional databases may be damaged this way (the probability is lower but its still possible). It’s valid for MiariaDB, Postgres, MS SQL, Oracle, Sybase…
Such a binary copy can be performed only if database is stopped (or at least with no operations performed during copy process - still with some risk). Such databases offer option to create consistent backup during runtime, usually by calling special command.

e-raser · January 2, 2021, 11:26pm

Fully agree. There is for sure a better way to snapshot the SQLite database. It’s just that no one cares cause „it’s just the corset database, throw it away, go loose some days of data“. I‘m just a bit tired otherwise I‘d create a feature request for this on GitHub. Meanwhile I dump my important data to InfluxDB in parallel so that one has become more important.

For the moment I‘m done with it and as soon as better hardware with more RAM is finally shipped I might go for the MySQL/MariaDB addon, but would first need to find out how snapshots are created for those addons/containers - don’t want to end up the same way like with SQLite.

delessebe · January 5, 2021, 7:41am

Good work!
For me it didn’t work with the “sqlite3 home-assistant_v2.db “.dump” | sqlite home-assistant_v2.db_fixed” command. Ended with an empty file.
What worked for me was “sqlite3 home-assistant_v2.db “.recover” | sqlite home-assistant_v2.db_fixed”
(result: almost same size as original and no errors when copied back to HA)

ebendl · January 18, 2021, 7:23am

Thanks for this, will give it a go. Part of my problem is that I do have long-term values, especially from utility_meter values (and the underlying sensors) that I do not want to lose.

I see in my case the problem is in the events table – is there any way to only recreate the events table and keep the rest?

delessebe · January 18, 2021, 8:16am

Delete all from the events table? or drop table and recreate? https://www.home-assistant.io/docs/backend/database/

ebendl · January 18, 2021, 8:51pm

This worked perfectly – thanks! My recovered DB ended up about a third of the original, but
a) it works, and,
b) the state history is there, and so are my utility_meter values.

Thanks!!

e-raser · January 29, 2021, 6:44pm

That´s not necessarily because of loosing data, but possibly due to the former version not been repacked for some time. Purging only limits growth, repacking throws away old data according to the purge_keep_days setting (Recorder - Home Assistant).

e-raser · February 4, 2022, 12:27am

Always great to come back to a topic started by oneself.

Had a foreign key constraint error resulting in some kind of “self protection” where the database copied itself to a home-assistant_v2.db.corrupt.2022-02-03T20:59:16.490886+00:00 file and recreated the home-assistant_v2.db from scratch - almost empty. EVERYTHING was gone, entities history of course and the Energy Dashboard was completely empty too - of course. What a shocking moment.

OK, I had two options:

Short way: try to fix the foreign key constraint errors end restore consistency ==> potentially only loose some hours of data
Long way: restore database from latest backup (nightly) ==> loose about 19 hours of data.

Long story short:

Option 2 is still the hard way. Step 3 of Fix corrupted / malformed SQLite database (home-assistant_v2.db) would probably work as it did for many many others already, but (fortunately) I didn’t need to go that path up to now as there’s always been another “corrupted/inconsistent, but not malformed” database file.
Finally HA devs are actively working on the core issue the SQLite database gets malformed during backup (because it’s a physical/file based backup during operation (HA still running and writing data to the db) - see 1.2 on How To Corrupt An SQLite Database File) which - still, in February 2022 - renders that file pretty useless in the first place (or for every normal user without fix knowledge).
→ Here we go, that’s where I put my bets for a few weeks meanwhile, closely watching every activity: https://github.com/home-assistant/core/pull/60874
Needless to say a trustworthy backup mechanism which conserves a consistent database IS A MUST meanwhile (always was for history data addicted like me) because we now all have the energy dashboard and long-term statistics. All those great 2021 improvements/features rely on the database, so it’s not too early to finally make that groundwork imho. I think that should fit in 2022s year of “streamlining experiences”:
- Streamlining Experiences - Home Assistant
- 2022.2: Let's start streamlining! - Home Assistant
Ah yes, by the way: Option 1 was the path I went. Fast lane. Fixed the foreign key constraint issues (only 3 rows and I exactly knew why/what happened) in the corrupted database, shutdown HA, replaced the new/empty one by the fixed one, started up HA, and hello history, I missed you so much. Energy dashboard only calculates once a hour so it took until the next full hour to create the next hour graph.

To sum up:

3 and a 1/2 hours of unplanned and unnecessary restoration tasks and a few lessons learned later… I really hope not too many others will experience that “database malformed/corrupted, hell no what to do!” situation, won’t land here and won’t read this.

By the way, for me there are only two possible ways to go in terms of database (because as my HA grew I really really REALLY need to make sure my database is safe and backed up reliably - I did mention it, did I?):

Either https://github.com/home-assistant/core/pull/60874 will finally make its way into a productive release so I can reestablish trust in homeasisstant container backups including the SQLite database.
Or some experts at Migrating home assistant database from sqlite to mariadb - which is not yet a perfect but still the best migration guide out there! - will finally fix the last little bugs (energy dashboard related) for migrating from SQLite to MySQL so I can finally switch to another level of database management. Including a professional online editing (phpMyAdmin addon with logical backups/DB dumps instead of very simple SQLite web addon which doesn’t even provide a query history etc.), more control, more options, improved backup strategys etc…

Until that

I will do a ha core stop; cp /config/home-assistant_v2.db /backup/home-assistant_v2_BU_$(date +%F\_%H%M%S).db; ha core start and copy that “working cause created offline” backup away from the HA server, just in case. I plan to do that at least once a month, probably before every major (monthly) HA update
I will also have a look into these logical backups
- classical dump using sqlite3 /config/home-assistant_v2.db .dump > /backup/home-assistant_v2_BU_$(date +%F\_%H%M%S).sql)
- backup function sqlite3 /config/home-assistant_v2.db ".backup '/backup/home-assistant_v2_BU_$(date +%F\_%H%M%S).db'"

Maybe I can automate one of these backups to run it regularly to reduce HA downtime by running a physical backup. Will have a look at what I will get and if consistency is fine. But if it would be that easy I’m sure HA dev team would have implemented it as backup subtask for homeassistant container years ago. Sounds promising anyway:

It's different from regularly file copying, because it takes care of any users currently working on the database. There are proper locks set on the database, so the backup is done exclusively.

orange-assistant · February 11, 2022, 11:24pm

Now it got me.

Was away for 3 days an when I came back I releazied the history just ends yesterday (2022-02-10).

Checking the config folder I found the home-assistant_v2.db.corrupt.2022-02-10..... and did run a

sqlite3 ./home-assistant_v2.db ".recover" | sqlite3 ./home-assistant_v2.db.fix

on the corrupted db. The process it self looked successful and the db was shrinking from 4.4GB to 3.8GB.

When starting ha core with the new (fixed) db file it just got detected as corrupted again

So I bite the bitter apple and restored a db file from my last backup a week ago. First I was pleased as I could browse the energy statistics again while ha core was starting up (with the db from the backup) but as time ha was completely started (indicated on the web front down left with a message) the database from the backup was again detected as corrupted and the energy statistics I just browsed seconds ago are gone


2022-02-11 23:14:45 WARNING (Recorder) [homeassistant.components.recorder.migration] Database is about to upgrade. Schema version: 23
2022-02-11 23:14:51 ERROR (Recorder) [homeassistant.components.recorder.util] Error executing query: (sqlite3.DatabaseError) database disk image is malformed
[SQL: SELECT statistics_short_term.id AS statistics_short_term_id, statistics_short_term.created AS statistics_short_term_created, statistics_short_term.start AS statistics_short_term_start, statistics_short_term.mean AS statistics_short_term_mean, statistics_short_term.min AS statistics_short_term_min, statistics_short_term.max AS statistics_short_term_max, statistics_short_term.last_reset AS statistics_short_term_last_reset, statistics_short_term.state AS statistics_short_term_state, statistics_short_term.sum AS statistics_short_term_sum, statistics_short_term.metadata_id AS statistics_short_term_metadata_id 
FROM statistics_short_term LEFT OUTER JOIN (SELECT statistics_short_term.start AS start, statistics_short_term.metadata_id AS metadata_id, 1 AS is_duplicate 
FROM statistics_short_term GROUP BY statistics_short_term.metadata_id, statistics_short_term.start 
HAVING count(*) > ?) AS anon_1 ON anon_1.metadata_id = statistics_short_term.metadata_id AND anon_1.start = statistics_short_term.start 
WHERE anon_1.is_duplicate = ? ORDER BY statistics_short_term.metadata_id, statistics_short_term.start, statistics_short_term.id DESC
 LIMIT ? OFFSET ?]
[parameters: (1, 1, 998, 0)]
(Background on this error at: https://sqlalche.me/e/14/4xp6)

Any idea what’s happening here?

e-raser · February 11, 2022, 11:51pm

I feel you. We know that DB files from the „backup“ are useless. They are working on it, but it might take months they finally fix this - after years. Short: forget the database backups.

I don’t know what else you can try. If the recover process did not work for you there might be no other option.

Maybe… run some offline consistency checks after the .recover, (quick check, full check and foreign key check). Perhaps you’re lucky and output tells you what exactly is wrong so you can dig into it trying to fix it.

full integrity check (might take a while): Pragma statements supported by SQLite
see foreign key check on the same page

By the way: looks like your database is trying to upgrade. Did you update HA meanwhile? So are you trying to use a database from a former HA release with a newer HA? I would try to have HA Core and DB at the same date.

orange-assistant · February 20, 2022, 1:33am

I did this but I now also tried to restore backups (including full backups) with same ha & db versions. Unlucky me all database get recognized as corrupt. Latest log spitted out this:

2022-02-20 01:45:29 ERROR (Recorder) [homeassistant.components.recorder.util] The database at //config/home-assistant_v2.db is corrupt or malformed
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/components/recorder/util.py", line 223, in validate_sqlite_database
run_checks_on_open_db(dbpath, conn.cursor())
File "/usr/src/homeassistant/homeassistant/components/recorder/util.py", line 234, in run_checks_on_open_db
sanity_check_passed = basic_sanity_check(cursor)
File "/usr/src/homeassistant/homeassistant/components/recorder/util.py", line 212, in basic_sanity_check
cursor.execute(f"SELECT * FROM {table} LIMIT 1;") # nosec # not injection
sqlite3.OperationalError: Could not decode to UTF-8 column 'created' with text '2022-01-30!04:2:19.�0553'
2022-02-20 01:45:29 ERROR (Recorder) [homeassistant.components.recorder.util] The system will rename the corrupt database file //config/home-assistant_v2.db to //config/home-assistant_v2.db.corrupt.2022-02-20T01:45:29.591454+00:00 in order to allow startup to proceed

It’s a shame that the backups I took aren’t of any value because they are all with corrupt databases

orange-assistant · March 3, 2022, 1:45pm

Could you drop me 3 lines how exactly to do this? I’m on a linux host but not familiar with any database stuff. Only thing I did till no is to have the sqlite3 recover thing running for a couple of times.

I’m super disappointing that the ha database got silently corrupted and even the backups not working. I hope I have a chance to restore my data (even if that means I will have a 3-4 weeks loss now).

Actually I was just planing to get some offers about solar power based on the ha statistics (whole house + dedicated appliances) … and now they are just gone

e-raser · March 3, 2022, 8:05pm

Open with sqlite3 /config/home-assistant_v2.db, then:

PRAGMA quick_check;
PRAGMA integrity_check;
PRAGMA foreign_key_check;

Depending on if and what you get, the real work just begins. Note that the integrity_check can take a long time (as it does a full consistency check) depending on the size of your database.

orange-assistant · March 5, 2022, 10:07am

I’m executed this yesterday and it’s still running…

$ sqlite3 home-assistant_v2.db 
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
sqlite> PRAGMA quick_check
   ...>

The db is 4.4GB and the system isn’t to stressed at all:

Is this the normal behavior? Is it possible to get some verbose output by any chance?

e-raser · March 5, 2022, 11:28am

Without the ; at the end of the line it waits forever.

Istria1704 · March 5, 2022, 12:02pm

I had a similar experience. But without doing any db restoration.
Basically, I tried the crude way of simply making multiple backups and importing them into a new HAOS VM until one was not corrupted. For me, iteration #3 seemed to work. The history was there and went back months. All was working.

Yesterday (days later), I noticed the history only goes back a couple of days (the day of the migration) and there is a corrupted database file.

orange-assistant · March 5, 2022, 12:10pm

What difference a semicolon can make

Turns out the quick check is really quick actually:

sqlite> PRAGMA quick_check;
*** in database main ***
Page 729453: free space corruption

next: integrity_check;

actually very quick too:

sqlite> PRAGMA integrity_check;
*** in database main ***
Page 729453: free space corruption
row 733 missing from index ix_events_event_type_time_fired
row 733 missing from index ix_events_context_id
row 733 missing from index ix_events_time_fired
database disk image is malformed

and the last check:

PRAGMA foreign_key_check;
states|152065729|events|1

Any ideas what this all means? Is it totally broken or is their a chance to get this db back to a working state?

orange-assistant · March 5, 2022, 12:14pm

Are you sure the restore was complete while you were browsing the history? My history is always available while the restore takes place but once it’s finished it will mark the (just imported) db as corrupt and start with a fresh one.
You can look ad the suffix of the corrupt db to see the exact timestamp.