Home-assistant_v2.db corrupt - How to recover long Term Statistics

Dear Community,

I have had to restore my home-assistant from VM based to now a HA Supervised. This all worked really easy and well. I was impressed.

BUT after first start it showed the long term statistics and after a few minutes, I got an error and a new DB was created and the old one was called corrupted in file name.

I tried to do all recovery methods on this forum: DB image malformed. How to fix it? - #5 by eddriesen or Fix corrupted / malformed SQLite database (home-assistant_v2.db). But without any luck.

I then moved to mariadb to avoid this, the next time (hopefully) as Iā€™m just running the ha addon., maybe need to consider using my NAS.

Anyway now my ask, how can I recover only my longterm statistics from db file in sqlite3 format and import into mariadb? This is what I tried and could even import some data, but it is not shown in HA:

sqlite3 home-assistant_v2.db .dump \
| sed -re 's/^PRAGMA .+OFF/SET FOREIGN_KEY_CHECKS=0;SET UNIQUE_CHECKS=0/' \
        -e 's/^CREATE INDEX .+//' \
        -e 's/^BEGIN TRANSACTION;$/SET autocommit=0;BEGIN;/' \
        -e '/^CREATE TABLE .+ \($/,/^\);/ d' \
        -e 's/^INSERT INTO "([^"]+)"/INSERT INTO \1/' \
        -e 's/\\n/\n/g' \
| perl -pe 'use utf8; use open qw(:std :utf8); binmode STDOUT, ":utf8mb4";s/\\u([0-9A-Fa-f]{4})/pack"U*",hex($1)/ge' > output.sql

grep -v "INSERT INTO events" output.sql > import.sql

cat import.sql | mysql -h 127.0.0.1 homeassistant --default-character-set=utf8mb4 -u homeassistant -p 

I also had problems importing it with datetime and references, so used this:

set sql_mode='';
set FOREIGN_KEY_CHECKS=0;

I have Influxdb, but was really enjoying the new energy dashboard and now all that data for 2 month is gone.

Any hint, idea is appreciated.

Thank you very much

@ ha_frw
Same with me; I did not find any solution on this. For me, it it unclear, if the long-term statistics (for the energy dashboard) are stored in the database, that is configured with the recorder. The docs just say something about the history:

The recorder integration is responsible for storing details in a database, which then are handled by the history integration.

Are the statistics of your energy dashboard are now stored in your new maria db?

not good to hear, that Iā€™m not the only one with the problem. Yes the long term are in the DB, they should be in statistics, whereas short term are somewhere else.


this is my maridb

But for the history graph of an entity, you need the events data in the db, because when I removed all events from being logged to the recorder also my history was empty but not the the statistics. I had to limit the size of the DB, so I removed most of the events. just do not remove state_changed, which is uncommented in the config below.

recorder:
  purge_keep_days: 1
  commit_interval: 10
  db_url: !secret mysql
  exclude:
    entities:
      - sensor.time
    event_types:
      - component_loaded
      - core_config_updated
      - device_registry_updated
      - entity_registry_updated
      - hacs/config
      - hacs/repository
      - hacs/stage
      - hacs/status
      - homeassistant_close
      - homeassistant_final_write
      - homeassistant_start
      - homeassistant_started
      - homeassistant_stop
      - ios.action_fired
      - knx_event
      - lovelace_updated
      - nodered
      - panels_updated
      - persistent_notifications_updated
      - service_registered
      - service_removed
      - tag_scanned
      - themes_updated
      - user_removed
      - timer_out_of_sync
      - ios.became_active
      - hacs/reload
      - call_service
      - zha_event
      #- state_changed

I have also an Influx DB now, which is consolidating all data to 5min average and then after 2 years to 15min to be safe. Influxdb (1.8) setup - ONE continuous query for WHOLE database - #3 by ha_frw

The Maria DB looks much better than the file based. I can only recommend to switch to mariadb and use influx for longterm in addition. Iā€™m running a yearly overall stats graph of temperature to engergy consumed and heating power required, which Iā€™m also unable to build with long term stats.

1 Like

I have just had this issue - I upgraded my Home Assistant (docker version), and, after a while (20 mins or so?) it must have given up on the database upgrade and created a new database. I still have the old one as file home-assistant_v2.db.corrupt.2023-05-08T12:45:29.086483+00:00. I would like to import the old data into the new database. I have good SQL skills, but I am not entirely sure which data I need to enable my graphs to go back before the moment I upgraded.

Can anyone advise on which table, and which records in those tables, need to be copied?

In particular, I notice the old database has nothing in states_meta, so I worry that just copying records across wouldnā€™t work.

I was unable to recover the data and have now also switched to maria db, to avoid this in the future.

Hi everyone,

Iā€™ve the same issue and would realy like to recover the energy data from the db.corrupt fileā€¦

BR,

Hi,
I have the same issue. On Jun 18th I lost all my energy data, but now I know how to recover.
1. Important (I missed that and corrupted my db a couple times): STOP HA Core in ssh with ā€œha core stopā€ or via service call!
2. download the current home-assistant_v2.db and home-assistant_v2.db.corrupt.* databases and open the current db in SQLite Browser and attach also the corrupt database as corrupt
3. As some statistics_meta record could be missing, get those from corrupt db:

-- insert possibly missing statistics_meta records
insert into statistics_meta
select row_number() over () + (select max(id) from statistics_meta) as id, c.statistic_id, c.source, c.unit_of_measurement, c.has_mean, c.has_sum, c.name
from corrupt.statistics_meta c
left outer join statistics_meta m on m.statistic_id = c.statistic_id
where m.id is null;

4. Before inserting the records, update the id column of the existing data:

-- move existing ids, as all corrupt records are older
update statistics
set id = id + (select max(id) from corrupt.statistics);

5. Now, the statistics table is missing all those records from the corrupt database, so we import those with this sql:

-- import statistics from corrupt db
insert into statistics
select c.id, c.created, c.created_ts, m.new_id as metadata_id, c.start, c.start_ts, c.mean, c.min, c.max, c.last_reset, c.last_reset_ts, c.state, c.sum
from corrupt.statistics c
inner join (
	select o.id as old_id
	, n.id as new_id
	, case when row_number() over (partition by o.id) > 1 or row_number() over (partition by n.id) > 1 then 0 else 1 end as test
	from corrupt.statistics_meta o
	inner join statistics_meta n on n.statistic_id = o.statistic_id
) m on m.old_id = c.metadata_id and m.test = 1

6. As the sum is incorrect for new values in current db, we update those as well:

-- update sum values causing mega negative values for day of corruption in Energy Dashboard
update statistics
set sum = sum + a.add_sum
from (
	select s.id, a.statistic_id, sum(a.lag_sum) as add_sum
	from statistics s
	inner join (
		select *
		from (
			select m.statistic_id, s.*, lag(s.sum) over (partition by s.metadata_id order by s.start_ts) as lag_sum
			from statistics s
			inner join statistics_meta m on m.id = s.metadata_id
			where s.sum is not null
		) t
		where t.sum < t.lag_sum
	) a on a.metadata_id = s.metadata_id and s.start_ts >= a.start_ts
	group by s.id, a.statistic_id
) a 
where a.id = statistics.id

7. Rebuild statistics_short_term from todayā€™s statistics - it is needed so new statistic records have correct sum value

-- rebuild statistics_short_term
delete from statistics_short_term;
insert into statistics_short_term
select row_number() over () as id, c.created, c.created_ts, c.metadata_id, c.start, c.start_ts, c.mean, c.min, c.max, c.last_reset, c.last_reset_ts, c.state, c.sum
from statistics c
where date(c.start_ts, 'unixepoch') = DATE('now')
order by c.start_ts, c.id;

8. Start ha core again :+1:

All my history in Energy Dashboard is now back again - running smooth for many hours nowā€¦ :slight_smile:

5 Likes

Worked for me as well!

1 Like

Hmmm, I get a error during step 6 where you ā€˜select *ā€™ from the first inner join, it then offers all of the available commands and repeats that a couple of times:

sqlite> update statistics
from (
        select s.id, a.statistic_id, sum(a.lag_sum) as add_sum
        from statistics s
        inner join (
                select *
                from (
                        select m.statistic_id, s.*, lag(s.sum) over (partition by s.metadata_id order by s.start_ts) as lag_sum
   ...> set sum = sum + a.add_sum
   ...> from (
   ...> select s.id, a.statistic_id, sum(a.lag_sum) as add_sum
   ...> from statistics s
   ...> inner join (
   ...>
Display all 235 possibilities? (y or n)
ABORT                                           ROW
ACTION                                          ROWS
ADD                                             SAVEPOINT
AFTER                                           SELECT
ALL                                             SET

Any ideas please, I just lost over two years of statistics after the home assistant database got corrupted itself this morning!

Hi here,
sorry - couldnā€™t reply earlier.
Iā€™ve used SQLite Browser (Windows App) to attach both databases and the sql pane to run the commands.
It looks to me the statement got cut off probably?
Iā€™d try testing a select on the query to check whatā€™s going on:

select *
from (
	select s.id, a.statistic_id, sum(a.lag_sum) as add_sum
	from statistics s
	inner join (
		select *
		from (
			select m.statistic_id, s.*, lag(s.sum) over (partition by s.metadata_id order by s.start_ts) as lag_sum
			from statistics s
			inner join statistics_meta m on m.id = s.metadata_id
			where s.sum is not null
		) t
		where t.sum < t.lag_sum
	) a on a.metadata_id = s.metadata_id and s.start_ts >= a.start_ts
	group by s.id, a.statistic_id
) a 
where a.id = statistics.id

Cheers - cosote.

1 Like

I had the same problem. Every night at 04.12 the data disappeared. It took me a while (10 days) to find this solution. I hope that tonight it does not disappear.

I have, in several backups, most of the data of the past 10 days. I assume that copying that (energy) data back is possible, but I not a coder and not into database. Cosote, Thanks for posting the procedure and can you help me with a script?

PS: I did not do step 2, but opened the databases on my RPi. Before I could do that I had to make a copy of the corrupt database, because of the file name (home-assistant_v2.db.corrupt.2023-08-20T02:12:00.520929+00:00).

Great post! Worked like a charm for me!
Let me buy you a coffee :slight_smile:

worked like a charm on postgresql. Just check the log everything is working. I had to update the incremental ID from the table statistic.

Link for a coffee?

Im also a victim to thisā€¦ its crazy that it can happen so easliy and that we store the precoius long term data in it. I have it all in influx also but the energy dashboard is just to neatā€¦ I will move to maria and try to migrate the data from the corrupt dbā€¦

Hi,

this morning my DB got corrupted and I donā€™t know why.

However, when following your guide (thanks for this as I am a SQL rookie).
I get database disk image is malformed from DB Browser for SQLite when I want to attach the corrupt DB

Any suggestion?

EDIT:
I solved my issue using this guide:
Restore corrupt database after nightly purge - Installation / Home Assistant OS - Home Assistant Community (home-assistant.io)

1 Like

Hi Cosote,

Recently I ran into a database corruption as well, your procedure seems to have recoverd all my historical data, but, solar production data seems to have been imported incorrectly.

All recovered solar data has been divided by 1000, changing MWh values into kWh (and of course kWh into Wh). Additionally, the Energy dashboard indicators for ā€˜Self-consumed solar energyā€™ and ā€˜Self-sufficiencyā€™ are greyed out and display 0% for any period touching recovered historical data.

As Iā€™m not an SQL guru, any idea how to correct these?

I assume the statistics copy operation somehow does not check for, or includes the energy data quantity (Wh, kWh, MWh etc.).

For anyone trying to perform your procedure, some additional information about the procedure I performed:

When purely using the home assistant add-in Terminal & SSH, as soon as I stop ha core process, the add-on for ssh access is also no longer working using the webinterface and so is the file editor.

To circumvent this:

  • Update the Terminal & SSH add-in configuration to include a password (Options section) and entered the SSH port (Network section).
  • This allows for direct SSH access (I used Putty as Iā€™m running Windows on my PC)
  • Now Iā€™m able to access Home Assistant (running on a Raspberry Pi 4) using SSH and transfer the running DB file and the corrupt one. Iā€™ve used PSCP (part of Putty installation) to perform the file transfer. syntax:

pscp -P 22 root@:/config/home-assistant_v2.db c:\temp\home-assistant_v2.db`

Filetransfers using WinSCP for instance are way to slow and max out the Piā€™s CPU

important: be aware the corrupt database filename uses characters not supported by Windows, so when copying the corrupt database file, replace the ā€œ:ā€ characters in the target filename

now I was able to use SQLite browser on Windows to perform the procedure Cosote published.

Took me some time to figure the additional steps out, so shared them in case anyone else runs into the same challenges :wink:

It works great! Thanks a lot!!! :slight_smile:

Got a little more details in the solar energy issue,
the new database, created after the corruption, indeed uses Wh instead of kWh, as can be seen in the statistics_meta table of the database.

I solved my issues by first adding the missing data from the corrupt database copy to the active database. Then correction the value of the state column within the statistics table by multiplying the value with 1000 for each entry dating from the source database.

Additionally I corrected the value in the sum column, which seems the be the actually used value for the energy dashboard.

Not completely sure why, but the sum correction sql query did not work correctly for me, so Iā€™ve used a seperate query for the imported content and manually corrected the newer values to have the correct values throughout the total recorded timerange.

@cosote
Thank you for posting this, but I do have troubles.

I think my database version does not match yours? I am running Home Assistant Core 2022.6.6

I get a

Execution finished with errors.
Result: no such column: c.created_ts

error.

So after checking my statistics table:
image

I tried to modify the query (sorry I can only do very simple SQL, so your statements are wizard level to me). So I tried to remove the offending colloums and hoping it works (created_ts last_reset_ts etc)

-- import statistics from corrupt db
insert into statistics
select c.id, c.created, m.new_id as metadata_id, c.start, c.mean, c.min, c.max, c.last_reset, c.state, c.sum
from corrupt.statistics c
inner join (
	select o.id as old_id
	, n.id as new_id
	, case when row_number() over (partition by o.id) > 1 or row_number() over (partition by n.id) > 1 then 0 else 1 end as test
	from corrupt.statistics_meta o
	inner join statistics_meta n on n.statistic_id = o.statistic_id
) m on m.old_id = c.metadata_id and m.test = 1

I get

Execution finished with errors.
Result: UNIQUE constraint failed: statistics.metadata_id, statistics.start

So I am kind of lostā€¦ any advice?
Or better - do you see how to rewrite it to my version of HA? :slight_smile: