Recalculate statistics table sum to fix wrong energy data

I’ve been having some issues with the Energy data due to switching from Shelly integration to using MQTT. The sensors I configured over MQTT were sending the wrong data, causing wrong data in the statistics table for those days. My issue in particular was that the new sensors (which I gave the same name as the shelly sensors to keep the history) were sending data in watt-minutes instead of kWh, resulting in a very small ‘state’ number and increasing the sum value by a lot. This causes spikes in my energy readouts.

Important note:
The fix involves only reading data from the statistics table, not the states table which hold the actual sensor data. So you need at least some correct data in the statistics table to be able to use this fix.
The guide involves deleting the wrong data and then recaculating the ‘sum’ field based on the ‘state’ fields. No other tables are used.
I recommend making a copy of each table before executing any updates or deletes.

Prerequisites:

  • phpMyAdmin or any other SQL tool that allows you to execute queries and create / execute procedures
  • Some knowledge of SQL / databases so you at least understand what you’re doing

Steps to perform:

  1. Find and delete the wrong data from the statistics and statistics_short_term tables

First step is to identify which ID the sensors have, which you can find in the statistics_metadata table:
image
In my case, I have 3 energy sensors (Shelly 3EM), with ID’s 31, 37 & 43.
With this you can look at both the statistics and statistics_short_term table for those entity ID’s (one at a time!) and find the ID’s of the data entries that you want to delete. You’ll need to scroll through quite a few rows to find the start & end of the wrong data.
The query to delete them is as follows (for each table and entity you found above). Note that the ID’s in both tables differ so be sure you have 4 sets of ID’s for each entity:

DELETE FROM STATISTICS WHERE METADATA_ID = 'id found previously' AND ID >= <start ID of wrong data> AND ID <= <end ID of wrong data>
DELETE FROM STATISTICS_SHORT_TERM WHERE METADATA_ID = 'id found previously' AND ID >= <start ID of wrong data> AND ID <= <end ID of wrong data>
  1. Fix the statistics & statistics_short_term data

Now that the wrong data is cleared, I created 2 queries (procedures) to recalculate the sum based on the states in the statistics or statistics_short_term tables:
Each procedure takes 2 arguments: the ID of the sensor (metadata_id) and the date from which you want to recalculate. If you have a lot of entries in that table be sure to select a date as close as possible from the start of the problems.

Long term table:

DELIMITER $$
CREATE DEFINER=`homeassistant`@`%` PROCEDURE `fix_long_term`(IN `metadata` INT, IN `from_date` DATE)
BEGIN
  DECLARE done INT DEFAULT FALSE;
  DECLARE curr_id INT;
  DECLARE cur1 CURSOR FOR SELECT id FROM statistics where metadata_id = metadata and created >= from_date;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

  OPEN cur1;

  read_loop: LOOP
    FETCH cur1 INTO curr_id;
	
    IF done THEN
      LEAVE read_loop;
    END IF;
	update statistics as s1 inner join (select s1.id, s2.sum as prev_sum, s2.state as prev_state from statistics as s1 inner join statistics as s2 on s2.id = (select id from statistics s2 where s1.metadata_id = s2.metadata_id and s2.created >= from_date and s2.id < curr_id order by id desc limit 1) where s1.id = curr_id) as prev_table on s1.id = prev_table.id set s1.sum = round(prev_sum + state - prev_state,2);
  END LOOP;

  CLOSE cur1;
END$$
DELIMITER ;

Short term table:

DELIMITER $$
CREATE DEFINER=`homeassistant`@`%` PROCEDURE `fix_short_term`(IN `metadata` INT, IN `from_date` DATE)
BEGIN
  DECLARE done INT DEFAULT FALSE;
  DECLARE curr_id INT;
  DECLARE cur1 CURSOR FOR SELECT id FROM statistics_short_term where metadata_id = metadata and created >= from_date;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

  OPEN cur1;

  read_loop: LOOP
    FETCH cur1 INTO curr_id;
	
    IF done THEN
      LEAVE read_loop;
    END IF;
	update statistics_short_term as s1 inner join (select s1.id, s2.sum as prev_sum, s2.state as prev_state from statistics_short_term as s1 inner join statistics_short_term as s2 on s2.id = (select id from statistics_short_term s2 where s1.metadata_id = s2.metadata_id and s2.created >= from_date and s2.id < curr_id order by id desc limit 1) where s1.id = curr_id) as prev_table on s1.id = prev_table.id set s1.sum = round(prev_sum + state - prev_state, 2);
  END LOOP;

  CLOSE cur1;
END$$
DELIMITER ;

You need to execute each procedure one time for each entity you want to fix.

I hope this helps someone, it took me some time to find a solution but it seems there was none apart from direct DB updates :slight_smile:

After this update my Energy data is correct, except of course the days where we deleted the data. However the total energy consumed is not lost, it just gets added to the first correct period (I can only post 1 picture as a newbie :frowning: )
So one day will have a lot of energy but the overall monthly & yearly stats should be correct (unless ofcourse the data you deleted happens to span a month-end or year-end).

7 Likes

I like what you did, but what did you use for SQL. usually you would need to install sqlite3.

Hi,

i have the same problem, i have do:

* cleaned stats table
* i find metadata_id (42) at statistics_meta table
* but i can’t find possible data at statistics - the same as i see at energy dash

at energy dash i see 823kWh for solar production at 1 hour
how can i fix this?

Why isn’t there an easy way to recreate the statistics after the database has been cleaned up.

found this https://community.home-assistant.io/t/energy-management-in-home-assistant/326854/1126

I found similar problem and took a similar path. I’m documenting here the result so you can use my store procedure.

  1. There is NO NEED to delete any record in the database. Only the sum field will be updated.
  2. It should be safe to execute the procedure multiple times as it will fix only the values that cause negative values. If the sum is correct, the procedure will not update anything.
  3. It is not needed to provide a date, the entire table will be considered.
  4. It is recommended (I would say, it is mandatory) to run the procedures with Home Assistant NOT RUNNING. Because I suspect HA maintains internal counters or caches over the database records and next inserts will be wrong (and will produce more negative numbers)

Now the code

CREATE DEFINER=`hass`@`%` PROCEDURE `fix_statistics_sum`(IN `entity_id` VARCHAR(255))
BEGIN
	DECLARE curr_id INT;
	DECLARE prev_sum FLOAT DEFAULT -1;
	DECLARE curr_sum FLOAT;
	DECLARE sum_diff FLOAT;
	DECLARE acum_error FLOAT DEFAULT 0;
	DECLARE done INT DEFAULT FALSE;
	
	DECLARE cur1 CURSOR FOR SELECT id, sum FROM statistics s WHERE s.metadata_id = (SELECT id FROM statistics_meta sm WHERE sm.statistic_id = entity_id) ORDER BY id;
	DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

	OPEN cur1;

	read_loop: LOOP
		FETCH cur1 INTO curr_id, curr_sum;
	
		IF done THEN
			LEAVE read_loop;
		END IF;
	
		IF prev_sum <> -1 THEN 
			SET sum_diff = curr_sum - prev_sum;
			IF sum_diff < 0.0 THEN
				SET acum_error = acum_error - sum_diff;
			END IF;
			UPDATE statistics s SET s.sum = (curr_sum + acum_error) WHERE s.id = curr_id;
		END IF;
		SET prev_sum = curr_sum;
	
	END LOOP;

	select acum_error, prev_sum, curr_sum ;

	CLOSE cur1;


END

CREATE DEFINER=`hass`@`%` PROCEDURE `fix_statistics_short_term_sum`(IN `entity_id` VARCHAR(255))
BEGIN
	DECLARE curr_id INT;
	DECLARE prev_sum FLOAT DEFAULT -1;
	DECLARE curr_sum FLOAT;
	DECLARE sum_diff FLOAT;
	DECLARE acum_error FLOAT DEFAULT 0;
	DECLARE done INT DEFAULT FALSE;
	
	DECLARE cur1 CURSOR FOR SELECT id, sum FROM statistics_short_term s WHERE s.metadata_id = (SELECT id FROM statistics_meta sm WHERE sm.statistic_id = entity_id) ORDER BY id;
	DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

	OPEN cur1;

	read_loop: LOOP
		FETCH cur1 INTO curr_id, curr_sum;
	
		IF done THEN
			LEAVE read_loop;
		END IF;
	
		IF prev_sum <> -1 THEN 
			SET sum_diff = curr_sum - prev_sum;
			IF sum_diff < 0.0 THEN
				SET acum_error = acum_error - sum_diff;
			END IF;
			UPDATE statistics_short_term s SET s.sum = (curr_sum + acum_error) WHERE s.id = curr_id;
		END IF;
		SET prev_sum = curr_sum;
	
	END LOOP;

	select acum_error, prev_sum, curr_sum ;

	CLOSE cur1;


END

Regards
Ignacio

This was a great help for me, thanks.
My cost calculation started to accelerate to ludicrous amounts after I migrated my data to MariaDB.

Just be cautious - the “energy cost” (not “energy”) statistics can have the “last_reset” column filled, and on the rows that this datetime changes from the previous row, the “prev_state” must not be subtracted from “state”. So have a look at the rows for your metadata_id, and see if it has “last_reset” set.

In this case the procedure must be altered to take care of this. I did this in the ugliest way possible because I didn’t want to spend more time on it, here it is in case you want to improve it:

CREATE DEFINER=`homeassistant`@`%` PROCEDURE `fix_short_term`(IN `metadata` INT, IN `from_date` DATETIME)
BEGIN
  DECLARE done INT DEFAULT FALSE;
  DECLARE curr_id INT;
  DECLARE reset_date DATETIME;
  DECLARE prev_reset_date DATETIME;
  DECLARE cur1 CURSOR FOR SELECT id, last_reset FROM statistics_short_term where metadata_id = metadata and created >= from_date;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

  OPEN cur1;
  FETCH cur1 INTO curr_id, reset_date;
  SET prev_reset_date = reset_date;

  read_loop: LOOP
	
    IF done THEN
      LEAVE read_loop;
    END IF;

    IF reset_date is null OR reset_date = prev_reset_date THEN
        update statistics_short_term as s1 inner join (
            select s1.id, s2.sum as prev_sum, s2.state as prev_state from statistics_short_term as s1
            inner join statistics_short_term as s2 on s2.id = (
                select id from statistics_short_term s2 where s1.metadata_id = s2.metadata_id and s2.created >= from_date and s2.id < curr_id order by id desc limit 1)
            where s1.id = curr_id)
	    as prev_table on s1.id = prev_table.id
	    set s1.sum = round(prev_sum + state - prev_state, 3);
    ELSE
        update statistics_short_term as s1 inner join (
            select s1.id, s2.sum as prev_sum, s2.state as prev_state from statistics_short_term as s1
            inner join statistics_short_term as s2 on s2.id = (
                select id from statistics_short_term s2 where s1.metadata_id = s2.metadata_id and s2.created >= from_date and s2.id < curr_id order by id desc limit 1)
            where s1.id = curr_id)
        as prev_table on s1.id = prev_table.id
        set s1.sum = round(prev_sum + state, 3);
			
        SET prev_reset_date = reset_date;
    END IF;
    
    FETCH cur1 INTO curr_id, reset_date;

  END LOOP;

  CLOSE cur1;
END