Recalculate statistics table sum to fix wrong energy data

I’ve been having some issues with the Energy data due to switching from Shelly integration to using MQTT. The sensors I configured over MQTT were sending the wrong data, causing wrong data in the statistics table for those days. My issue in particular was that the new sensors (which I gave the same name as the shelly sensors to keep the history) were sending data in watt-minutes instead of kWh, resulting in a very small ‘state’ number and increasing the sum value by a lot. This causes spikes in my energy readouts.

Important note:
The fix involves only reading data from the statistics table, not the states table which hold the actual sensor data. So you need at least some correct data in the statistics table to be able to use this fix.
The guide involves deleting the wrong data and then recaculating the ‘sum’ field based on the ‘state’ fields. No other tables are used.
I recommend making a copy of each table before executing any updates or deletes.

Prerequisites:

  • phpMyAdmin or any other SQL tool that allows you to execute queries and create / execute procedures
  • Some knowledge of SQL / databases so you at least understand what you’re doing

Steps to perform:

  1. Find and delete the wrong data from the statistics and statistics_short_term tables

First step is to identify which ID the sensors have, which you can find in the statistics_metadata table:
image
In my case, I have 3 energy sensors (Shelly 3EM), with ID’s 31, 37 & 43.
With this you can look at both the statistics and statistics_short_term table for those entity ID’s (one at a time!) and find the ID’s of the data entries that you want to delete. You’ll need to scroll through quite a few rows to find the start & end of the wrong data.
The query to delete them is as follows (for each table and entity you found above). Note that the ID’s in both tables differ so be sure you have 4 sets of ID’s for each entity:

DELETE FROM STATISTICS WHERE METADATA_ID = 'id found previously' AND ID >= <start ID of wrong data> AND ID <= <end ID of wrong data>
DELETE FROM STATISTICS_SHORT_TERM WHERE METADATA_ID = 'id found previously' AND ID >= <start ID of wrong data> AND ID <= <end ID of wrong data>
  1. Fix the statistics & statistics_short_term data

Now that the wrong data is cleared, I created 2 queries (procedures) to recalculate the sum based on the states in the statistics or statistics_short_term tables:
Each procedure takes 2 arguments: the ID of the sensor (metadata_id) and the date from which you want to recalculate. If you have a lot of entries in that table be sure to select a date as close as possible from the start of the problems.

Long term table:

DELIMITER $$
CREATE DEFINER=`homeassistant`@`%` PROCEDURE `fix_long_term`(IN `metadata` INT, IN `from_date` DATE)
BEGIN
  DECLARE done INT DEFAULT FALSE;
  DECLARE curr_id INT;
  DECLARE cur1 CURSOR FOR SELECT id FROM statistics where metadata_id = metadata and created >= from_date;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

  OPEN cur1;

  read_loop: LOOP
    FETCH cur1 INTO curr_id;
	
    IF done THEN
      LEAVE read_loop;
    END IF;
	update statistics as s1 inner join (select s1.id, s2.sum as prev_sum, s2.state as prev_state from statistics as s1 inner join statistics as s2 on s2.id = (select id from statistics s2 where s1.metadata_id = s2.metadata_id and s2.created >= from_date and s2.id < curr_id order by id desc limit 1) where s1.id = curr_id) as prev_table on s1.id = prev_table.id set s1.sum = round(prev_sum + state - prev_state,2);
  END LOOP;

  CLOSE cur1;
END$$
DELIMITER ;

Short term table:

DELIMITER $$
CREATE DEFINER=`homeassistant`@`%` PROCEDURE `fix_short_term`(IN `metadata` INT, IN `from_date` DATE)
BEGIN
  DECLARE done INT DEFAULT FALSE;
  DECLARE curr_id INT;
  DECLARE cur1 CURSOR FOR SELECT id FROM statistics_short_term where metadata_id = metadata and created >= from_date;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

  OPEN cur1;

  read_loop: LOOP
    FETCH cur1 INTO curr_id;
	
    IF done THEN
      LEAVE read_loop;
    END IF;
	update statistics_short_term as s1 inner join (select s1.id, s2.sum as prev_sum, s2.state as prev_state from statistics_short_term as s1 inner join statistics_short_term as s2 on s2.id = (select id from statistics_short_term s2 where s1.metadata_id = s2.metadata_id and s2.created >= from_date and s2.id < curr_id order by id desc limit 1) where s1.id = curr_id) as prev_table on s1.id = prev_table.id set s1.sum = round(prev_sum + state - prev_state, 2);
  END LOOP;

  CLOSE cur1;
END$$
DELIMITER ;

You need to execute each procedure one time for each entity you want to fix.

I hope this helps someone, it took me some time to find a solution but it seems there was none apart from direct DB updates :slight_smile:

After this update my Energy data is correct, except of course the days where we deleted the data. However the total energy consumed is not lost, it just gets added to the first correct period (I can only post 1 picture as a newbie :frowning: )
So one day will have a lot of energy but the overall monthly & yearly stats should be correct (unless ofcourse the data you deleted happens to span a month-end or year-end).

2 Likes