Home-assistant_v2.db corrupt - How to recover long Term Statistics

Dear Community,

I have had to restore my home-assistant from VM based to now a HA Supervised. This all worked really easy and well. I was impressed.

BUT after first start it showed the long term statistics and after a few minutes, I got an error and a new DB was created and the old one was called corrupted in file name.

I tried to do all recovery methods on this forum: DB image malformed. How to fix it? - #5 by eddriesen or Fix corrupted / malformed SQLite database (home-assistant_v2.db). But without any luck.

I then moved to mariadb to avoid this, the next time (hopefully) as I’m just running the ha addon., maybe need to consider using my NAS.

Anyway now my ask, how can I recover only my longterm statistics from db file in sqlite3 format and import into mariadb? This is what I tried and could even import some data, but it is not shown in HA:

sqlite3 home-assistant_v2.db .dump \
| sed -re 's/^PRAGMA .+OFF/SET FOREIGN_KEY_CHECKS=0;SET UNIQUE_CHECKS=0/' \
        -e 's/^CREATE INDEX .+//' \
        -e 's/^BEGIN TRANSACTION;$/SET autocommit=0;BEGIN;/' \
        -e '/^CREATE TABLE .+ \($/,/^\);/ d' \
        -e 's/^INSERT INTO "([^"]+)"/INSERT INTO \1/' \
        -e 's/\\n/\n/g' \
| perl -pe 'use utf8; use open qw(:std :utf8); binmode STDOUT, ":utf8mb4";s/\\u([0-9A-Fa-f]{4})/pack"U*",hex($1)/ge' > output.sql

grep -v "INSERT INTO events" output.sql > import.sql

cat import.sql | mysql -h 127.0.0.1 homeassistant --default-character-set=utf8mb4 -u homeassistant -p 

I also had problems importing it with datetime and references, so used this:

set sql_mode='';
set FOREIGN_KEY_CHECKS=0;

I have Influxdb, but was really enjoying the new energy dashboard and now all that data for 2 month is gone.

Any hint, idea is appreciated.

Thank you very much

@ ha_frw
Same with me; I did not find any solution on this. For me, it it unclear, if the long-term statistics (for the energy dashboard) are stored in the database, that is configured with the recorder. The docs just say something about the history:

The recorder integration is responsible for storing details in a database, which then are handled by the history integration.

Are the statistics of your energy dashboard are now stored in your new maria db?

not good to hear, that I’m not the only one with the problem. Yes the long term are in the DB, they should be in statistics, whereas short term are somewhere else.


this is my maridb

But for the history graph of an entity, you need the events data in the db, because when I removed all events from being logged to the recorder also my history was empty but not the the statistics. I had to limit the size of the DB, so I removed most of the events. just do not remove state_changed, which is uncommented in the config below.

recorder:
  purge_keep_days: 1
  commit_interval: 10
  db_url: !secret mysql
  exclude:
    entities:
      - sensor.time
    event_types:
      - component_loaded
      - core_config_updated
      - device_registry_updated
      - entity_registry_updated
      - hacs/config
      - hacs/repository
      - hacs/stage
      - hacs/status
      - homeassistant_close
      - homeassistant_final_write
      - homeassistant_start
      - homeassistant_started
      - homeassistant_stop
      - ios.action_fired
      - knx_event
      - lovelace_updated
      - nodered
      - panels_updated
      - persistent_notifications_updated
      - service_registered
      - service_removed
      - tag_scanned
      - themes_updated
      - user_removed
      - timer_out_of_sync
      - ios.became_active
      - hacs/reload
      - call_service
      - zha_event
      #- state_changed

I have also an Influx DB now, which is consolidating all data to 5min average and then after 2 years to 15min to be safe. Influxdb (1.8) setup - ONE continuous query for WHOLE database - #3 by ha_frw

The Maria DB looks much better than the file based. I can only recommend to switch to mariadb and use influx for longterm in addition. I’m running a yearly overall stats graph of temperature to engergy consumed and heating power required, which I’m also unable to build with long term stats.

1 Like

I have just had this issue - I upgraded my Home Assistant (docker version), and, after a while (20 mins or so?) it must have given up on the database upgrade and created a new database. I still have the old one as file home-assistant_v2.db.corrupt.2023-05-08T12:45:29.086483+00:00. I would like to import the old data into the new database. I have good SQL skills, but I am not entirely sure which data I need to enable my graphs to go back before the moment I upgraded.

Can anyone advise on which table, and which records in those tables, need to be copied?

In particular, I notice the old database has nothing in states_meta, so I worry that just copying records across wouldn’t work.

I was unable to recover the data and have now also switched to maria db, to avoid this in the future.

Hi everyone,

I’ve the same issue and would realy like to recover the energy data from the db.corrupt file…

BR,

Hi,
I have the same issue. On Jun 18th I lost all my energy data, but now I know how to recover.
1. Important (I missed that and corrupted my db a couple times): STOP HA Core in ssh with “ha core stop” or via service call!
2. download the current home-assistant_v2.db and home-assistant_v2.db.corrupt.* databases and open the current db in SQLite Browser and attach also the corrupt database as corrupt
3. As some statistics_meta record could be missing, get those from corrupt db:

-- insert possibly missing statistics_meta records
insert into statistics_meta
select row_number() over () + (select max(id) from statistics_meta) as id, c.statistic_id, c.source, c.unit_of_measurement, c.has_mean, c.has_sum, c.name
from corrupt.statistics_meta c
left outer join statistics_meta m on m.statistic_id = c.statistic_id
where m.id is null;

4. Before inserting the records, update the id column of the existing data:

-- move existing ids, as all corrupt records are older
update statistics
set id = id + (select max(id) from corrupt.statistics);

5. Now, the statistics table is missing all those records from the corrupt database, so we import those with this sql:

-- import statistics from corrupt db
insert into statistics
select c.id, c.created, c.created_ts, m.new_id as metadata_id, c.start, c.start_ts, c.mean, c.min, c.max, c.last_reset, c.last_reset_ts, c.state, c.sum
from corrupt.statistics c
inner join (
	select o.id as old_id
	, n.id as new_id
	, case when row_number() over (partition by o.id) > 1 or row_number() over (partition by n.id) > 1 then 0 else 1 end as test
	from corrupt.statistics_meta o
	inner join statistics_meta n on n.statistic_id = o.statistic_id
) m on m.old_id = c.metadata_id and m.test = 1

6. As the sum is incorrect for new values in current db, we update those as well:

-- update sum values causing mega negative values for day of corruption in Energy Dashboard
update statistics
set sum = sum + a.add_sum
from (
	select s.id, a.statistic_id, sum(a.lag_sum) as add_sum
	from statistics s
	inner join (
		select *
		from (
			select m.statistic_id, s.*, lag(s.sum) over (partition by s.metadata_id order by s.start_ts) as lag_sum
			from statistics s
			inner join statistics_meta m on m.id = s.metadata_id
			where s.sum is not null
		) t
		where t.sum < t.lag_sum
	) a on a.metadata_id = s.metadata_id and s.start_ts >= a.start_ts
	group by s.id, a.statistic_id
) a 
where a.id = statistics.id

7. Rebuild statistics_short_term from today’s statistics - it is needed so new statistic records have correct sum value

-- rebuild statistics_short_term
delete from statistics_short_term;
insert into statistics_short_term
select row_number() over () as id, c.created, c.created_ts, c.metadata_id, c.start, c.start_ts, c.mean, c.min, c.max, c.last_reset, c.last_reset_ts, c.state, c.sum
from statistics c
where date(c.start_ts, 'unixepoch') = DATE('now')
order by c.start_ts, c.id;

8. Start ha core again :+1:

All my history in Energy Dashboard is now back again - running smooth for many hours now… :slight_smile:

4 Likes

Worked for me as well!

1 Like

Hmmm, I get a error during step 6 where you ‘select *’ from the first inner join, it then offers all of the available commands and repeats that a couple of times:

sqlite> update statistics
from (
        select s.id, a.statistic_id, sum(a.lag_sum) as add_sum
        from statistics s
        inner join (
                select *
                from (
                        select m.statistic_id, s.*, lag(s.sum) over (partition by s.metadata_id order by s.start_ts) as lag_sum
   ...> set sum = sum + a.add_sum
   ...> from (
   ...> select s.id, a.statistic_id, sum(a.lag_sum) as add_sum
   ...> from statistics s
   ...> inner join (
   ...>
Display all 235 possibilities? (y or n)
ABORT                                           ROW
ACTION                                          ROWS
ADD                                             SAVEPOINT
AFTER                                           SELECT
ALL                                             SET

Any ideas please, I just lost over two years of statistics after the home assistant database got corrupted itself this morning!

Hi here,
sorry - couldn’t reply earlier.
I’ve used SQLite Browser (Windows App) to attach both databases and the sql pane to run the commands.
It looks to me the statement got cut off probably?
I’d try testing a select on the query to check what’s going on:

select *
from (
	select s.id, a.statistic_id, sum(a.lag_sum) as add_sum
	from statistics s
	inner join (
		select *
		from (
			select m.statistic_id, s.*, lag(s.sum) over (partition by s.metadata_id order by s.start_ts) as lag_sum
			from statistics s
			inner join statistics_meta m on m.id = s.metadata_id
			where s.sum is not null
		) t
		where t.sum < t.lag_sum
	) a on a.metadata_id = s.metadata_id and s.start_ts >= a.start_ts
	group by s.id, a.statistic_id
) a 
where a.id = statistics.id

Cheers - cosote.

1 Like

I had the same problem. Every night at 04.12 the data disappeared. It took me a while (10 days) to find this solution. I hope that tonight it does not disappear.

I have, in several backups, most of the data of the past 10 days. I assume that copying that (energy) data back is possible, but I not a coder and not into database. Cosote, Thanks for posting the procedure and can you help me with a script?

PS: I did not do step 2, but opened the databases on my RPi. Before I could do that I had to make a copy of the corrupt database, because of the file name (home-assistant_v2.db.corrupt.2023-08-20T02:12:00.520929+00:00).

Great post! Worked like a charm for me!
Let me buy you a coffee :slight_smile:

worked like a charm on postgresql. Just check the log everything is working. I had to update the incremental ID from the table statistic.

Link for a coffee?

Im also a victim to this… its crazy that it can happen so easliy and that we store the precoius long term data in it. I have it all in influx also but the energy dashboard is just to neat… I will move to maria and try to migrate the data from the corrupt db…

Hi,

this morning my DB got corrupted and I don’t know why.

However, when following your guide (thanks for this as I am a SQL rookie).
I get database disk image is malformed from DB Browser for SQLite when I want to attach the corrupt DB

Any suggestion?

EDIT:
I solved my issue using this guide:
Restore corrupt database after nightly purge - Installation / Home Assistant OS - Home Assistant Community (home-assistant.io)

1 Like