Recalculate statistics table sum to fix wrong energy data

I’ve been having some issues with the Energy data due to switching from Shelly integration to using MQTT. The sensors I configured over MQTT were sending the wrong data, causing wrong data in the statistics table for those days. My issue in particular was that the new sensors (which I gave the same name as the shelly sensors to keep the history) were sending data in watt-minutes instead of kWh, resulting in a very small ‘state’ number and increasing the sum value by a lot. This causes spikes in my energy readouts.

Important note:
The fix involves only reading data from the statistics table, not the states table which hold the actual sensor data. So you need at least some correct data in the statistics table to be able to use this fix.
The guide involves deleting the wrong data and then recaculating the ‘sum’ field based on the ‘state’ fields. No other tables are used.
I recommend making a copy of each table before executing any updates or deletes.

Prerequisites:

  • phpMyAdmin or any other SQL tool that allows you to execute queries and create / execute procedures
  • Some knowledge of SQL / databases so you at least understand what you’re doing

Steps to perform:

  1. Find and delete the wrong data from the statistics and statistics_short_term tables

First step is to identify which ID the sensors have, which you can find in the statistics_metadata table:
image
In my case, I have 3 energy sensors (Shelly 3EM), with ID’s 31, 37 & 43.
With this you can look at both the statistics and statistics_short_term table for those entity ID’s (one at a time!) and find the ID’s of the data entries that you want to delete. You’ll need to scroll through quite a few rows to find the start & end of the wrong data.
The query to delete them is as follows (for each table and entity you found above). Note that the ID’s in both tables differ so be sure you have 4 sets of ID’s for each entity:

DELETE FROM STATISTICS WHERE METADATA_ID = 'id found previously' AND ID >= <start ID of wrong data> AND ID <= <end ID of wrong data>
DELETE FROM STATISTICS_SHORT_TERM WHERE METADATA_ID = 'id found previously' AND ID >= <start ID of wrong data> AND ID <= <end ID of wrong data>
  1. Fix the statistics & statistics_short_term data

Now that the wrong data is cleared, I created 2 queries (procedures) to recalculate the sum based on the states in the statistics or statistics_short_term tables:
Each procedure takes 2 arguments: the ID of the sensor (metadata_id) and the date from which you want to recalculate. If you have a lot of entries in that table be sure to select a date as close as possible from the start of the problems.

Long term table:

DELIMITER $$
CREATE DEFINER=`homeassistant`@`%` PROCEDURE `fix_long_term`(IN `metadata` INT, IN `from_date` DATE)
BEGIN
  DECLARE done INT DEFAULT FALSE;
  DECLARE curr_id INT;
  DECLARE cur1 CURSOR FOR SELECT id FROM statistics where metadata_id = metadata and created >= from_date;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

  OPEN cur1;

  read_loop: LOOP
    FETCH cur1 INTO curr_id;
	
    IF done THEN
      LEAVE read_loop;
    END IF;
	update statistics as s1 inner join (select s1.id, s2.sum as prev_sum, s2.state as prev_state from statistics as s1 inner join statistics as s2 on s2.id = (select id from statistics s2 where s1.metadata_id = s2.metadata_id and s2.created >= from_date and s2.id < curr_id order by id desc limit 1) where s1.id = curr_id) as prev_table on s1.id = prev_table.id set s1.sum = round(prev_sum + state - prev_state,2);
  END LOOP;

  CLOSE cur1;
END$$
DELIMITER ;

Short term table:

DELIMITER $$
CREATE DEFINER=`homeassistant`@`%` PROCEDURE `fix_short_term`(IN `metadata` INT, IN `from_date` DATE)
BEGIN
  DECLARE done INT DEFAULT FALSE;
  DECLARE curr_id INT;
  DECLARE cur1 CURSOR FOR SELECT id FROM statistics_short_term where metadata_id = metadata and created >= from_date;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

  OPEN cur1;

  read_loop: LOOP
    FETCH cur1 INTO curr_id;
	
    IF done THEN
      LEAVE read_loop;
    END IF;
	update statistics_short_term as s1 inner join (select s1.id, s2.sum as prev_sum, s2.state as prev_state from statistics_short_term as s1 inner join statistics_short_term as s2 on s2.id = (select id from statistics_short_term s2 where s1.metadata_id = s2.metadata_id and s2.created >= from_date and s2.id < curr_id order by id desc limit 1) where s1.id = curr_id) as prev_table on s1.id = prev_table.id set s1.sum = round(prev_sum + state - prev_state, 2);
  END LOOP;

  CLOSE cur1;
END$$
DELIMITER ;

You need to execute each procedure one time for each entity you want to fix.

I hope this helps someone, it took me some time to find a solution but it seems there was none apart from direct DB updates :slight_smile:

After this update my Energy data is correct, except of course the days where we deleted the data. However the total energy consumed is not lost, it just gets added to the first correct period (I can only post 1 picture as a newbie :frowning: )
So one day will have a lot of energy but the overall monthly & yearly stats should be correct (unless ofcourse the data you deleted happens to span a month-end or year-end).

13 Likes

I like what you did, but what did you use for SQL. usually you would need to install sqlite3.

Hi,

i have the same problem, i have do:

* cleaned stats table
* i find metadata_id (42) at statistics_meta table
* but i can’t find possible data at statistics - the same as i see at energy dash

at energy dash i see 823kWh for solar production at 1 hour
how can i fix this?

Why isn’t there an easy way to recreate the statistics after the database has been cleaned up.

found this https://community.home-assistant.io/t/energy-management-in-home-assistant/326854/1126

I found similar problem and took a similar path. I’m documenting here the result so you can use my store procedure.

  1. There is NO NEED to delete any record in the database. Only the sum field will be updated.
  2. It should be safe to execute the procedure multiple times as it will fix only the values that cause negative values. If the sum is correct, the procedure will not update anything.
  3. It is not needed to provide a date, the entire table will be considered.
  4. It is recommended (I would say, it is mandatory) to run the procedures with Home Assistant NOT RUNNING. Because I suspect HA maintains internal counters or caches over the database records and next inserts will be wrong (and will produce more negative numbers)

Now the code

CREATE DEFINER=`hass`@`%` PROCEDURE `fix_statistics_sum`(IN `entity_id` VARCHAR(255))
BEGIN
	DECLARE curr_id INT;
	DECLARE prev_sum FLOAT DEFAULT -1;
	DECLARE curr_sum FLOAT;
	DECLARE sum_diff FLOAT;
	DECLARE acum_error FLOAT DEFAULT 0;
	DECLARE done INT DEFAULT FALSE;
	
	DECLARE cur1 CURSOR FOR SELECT id, sum FROM statistics s WHERE s.metadata_id = (SELECT id FROM statistics_meta sm WHERE sm.statistic_id = entity_id) ORDER BY id;
	DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

	OPEN cur1;

	read_loop: LOOP
		FETCH cur1 INTO curr_id, curr_sum;
	
		IF done THEN
			LEAVE read_loop;
		END IF;
	
		IF prev_sum <> -1 THEN 
			SET sum_diff = curr_sum - prev_sum;
			IF sum_diff < 0.0 THEN
				SET acum_error = acum_error - sum_diff;
			END IF;
			UPDATE statistics s SET s.sum = (curr_sum + acum_error) WHERE s.id = curr_id;
		END IF;
		SET prev_sum = curr_sum;
	
	END LOOP;

	select acum_error, prev_sum, curr_sum ;

	CLOSE cur1;


END

CREATE DEFINER=`hass`@`%` PROCEDURE `fix_statistics_short_term_sum`(IN `entity_id` VARCHAR(255))
BEGIN
	DECLARE curr_id INT;
	DECLARE prev_sum FLOAT DEFAULT -1;
	DECLARE curr_sum FLOAT;
	DECLARE sum_diff FLOAT;
	DECLARE acum_error FLOAT DEFAULT 0;
	DECLARE done INT DEFAULT FALSE;
	
	DECLARE cur1 CURSOR FOR SELECT id, sum FROM statistics_short_term s WHERE s.metadata_id = (SELECT id FROM statistics_meta sm WHERE sm.statistic_id = entity_id) ORDER BY id;
	DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

	OPEN cur1;

	read_loop: LOOP
		FETCH cur1 INTO curr_id, curr_sum;
	
		IF done THEN
			LEAVE read_loop;
		END IF;
	
		IF prev_sum <> -1 THEN 
			SET sum_diff = curr_sum - prev_sum;
			IF sum_diff < 0.0 THEN
				SET acum_error = acum_error - sum_diff;
			END IF;
			UPDATE statistics_short_term s SET s.sum = (curr_sum + acum_error) WHERE s.id = curr_id;
		END IF;
		SET prev_sum = curr_sum;
	
	END LOOP;

	select acum_error, prev_sum, curr_sum ;

	CLOSE cur1;


END

Regards
Ignacio

This was a great help for me, thanks.
My cost calculation started to accelerate to ludicrous amounts after I migrated my data to MariaDB.

Just be cautious - the “energy cost” (not “energy”) statistics can have the “last_reset” column filled, and on the rows that this datetime changes from the previous row, the “prev_state” must not be subtracted from “state”. So have a look at the rows for your metadata_id, and see if it has “last_reset” set.

In this case the procedure must be altered to take care of this. I did this in the ugliest way possible because I didn’t want to spend more time on it, here it is in case you want to improve it:

CREATE DEFINER=`homeassistant`@`%` PROCEDURE `fix_short_term`(IN `metadata` INT, IN `from_date` DATETIME)
BEGIN
  DECLARE done INT DEFAULT FALSE;
  DECLARE curr_id INT;
  DECLARE reset_date DATETIME;
  DECLARE prev_reset_date DATETIME;
  DECLARE cur1 CURSOR FOR SELECT id, last_reset FROM statistics_short_term where metadata_id = metadata and created >= from_date;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

  OPEN cur1;
  FETCH cur1 INTO curr_id, reset_date;
  SET prev_reset_date = reset_date;

  read_loop: LOOP
	
    IF done THEN
      LEAVE read_loop;
    END IF;

    IF reset_date is null OR reset_date = prev_reset_date THEN
        update statistics_short_term as s1 inner join (
            select s1.id, s2.sum as prev_sum, s2.state as prev_state from statistics_short_term as s1
            inner join statistics_short_term as s2 on s2.id = (
                select id from statistics_short_term s2 where s1.metadata_id = s2.metadata_id and s2.created >= from_date and s2.id < curr_id order by id desc limit 1)
            where s1.id = curr_id)
	    as prev_table on s1.id = prev_table.id
	    set s1.sum = round(prev_sum + state - prev_state, 3);
    ELSE
        update statistics_short_term as s1 inner join (
            select s1.id, s2.sum as prev_sum, s2.state as prev_state from statistics_short_term as s1
            inner join statistics_short_term as s2 on s2.id = (
                select id from statistics_short_term s2 where s1.metadata_id = s2.metadata_id and s2.created >= from_date and s2.id < curr_id order by id desc limit 1)
            where s1.id = curr_id)
        as prev_table on s1.id = prev_table.id
        set s1.sum = round(prev_sum + state, 3);
			
        SET prev_reset_date = reset_date;
    END IF;
    
    FETCH cur1 INTO curr_id, reset_date;

  END LOOP;

  CLOSE cur1;
END
1 Like

I’m not familiar with stored procedures. How do I create it in my DB?

When I try to execute the above code I get the error:

**SQL query:**

CREATE DEFINER=`homeassisant`@`%` PROCEDURE `fix_statistics_sum`(IN `entity_id` VARCHAR(255))
BEGIN
	DECLARE curr_id INT

**MySQL said:** [![Documentation](http://192.168.0.183/phpmyadmin/themes/dot.gif "Documentation")](http://192.168.0.183/phpmyadmin/url.php?url=https%3A%2F%2Fdev.mysql.com%2Fdoc%2Frefman%2F5.7%2Fen%2Ferror-messages-server.html)

`#1064 - You have an error in your SQL syntax; check the manual 
that corresponds to your MariaDB server version for the right syntax to 
use near '' at line 3`

I guess I can’t just copy paste it. How do I need to create it?

In my GUI (MySQL Workbench), there’s a section below Tables and Views called Stored Procedures. I right-click Stored Procedures and select Create Stored Procedure, and go from there. It should be possible to copy-paste it then.

Thanks.

I used Phpmyadmin, but I couldn’t figure it out there.
In MySQL Workbench it works as you describe indeed.

Hi, I stumpled across this guide whilst trying to fix an issue with my Energy dashboard where my Gas meter sensor wasn’t being picked up as a source, one suggestion was to change the unit_of_measurement: from “m3” to “m³” and that I should flush the statistics history to make sure it works properly. I knew nothing about the database used in HA and so began the learning curve.

My HA is a HA OS on Raspberry Pi which is nothing special, but just to set the context, and that it uses SQLite3 which again I’d never worked with before.
So to start with I connected to the Raspberry Pi using SSH and stopped the HA core
➜ ~ ha core stop

Next I downloaded the SQLite Linux binareis from SQLite Download Page
wget https://www.sqlite.org/2022/sqlite-tools-linux-x86-3390400.zip

Then I decompressed it using the unzip command
➜ ~ unzip https://www.sqlite.org/2022/sqlite-tools-linux-x86-3390400.zip

This then creates a directory called ./sqlite-tools-linux-x86-3390400 which isn’t great so I renamed it:
➜ ~ mv sqlite-tools-linux-x86-3390400 sqlite-tools

next I confirmed the location of my SQLite database as being in ./config and executed sqlite3 using the path to the DB
➜ ~ sqlite3 ../config/home-assistant_v2.db

I confirmed I was in the right DB and not a newly created on by checking for existing tables:

sqlite> .tables
event_data             state_attributes       statistics_runs
events                 states                 statistics_short_term
recorder_runs          statistics
schema_changes         statistics_meta

Then I needed to find the sensor ID I need to clear the stats for:

sqlite> SELECT * FROM "statistics_meta" WHERE "statistic_id" = "sensor.gas_meter";
37|sensor.gas_meter|recorder|m³|0|1|

With the ID of the sensor I could now clear the statistics data:

sqlite> DELETE FROM "statistics" WHERE "metadata_id" = "37";
sqlite> DELETE FROM "statistics_short_term" WHERE "metadata_id" = "37";
sqlite> .exit

Now with the stats data cleared I restarted the HA core:
➜ ~ ha core start

And lastly as there was a update to Home Assistant 2022.11.0 I installed the update and my sensor became visible in the energy dashboard with the correct unit of measurement and I was plotting my Gas consumption.

I hope this helps somebody else who needs to go through this process.

3 Likes

Any way to do this in PHPmyadmin? I’ve tried with MySQL workbench but it’s not compatible with mariadb and eventually ended up corrupting my DB

Can someone take a look at my post at Issue with Energy Database - Energy - Home Assistant Community (home-assistant.io)?
I have issue with Energy data … that maybe someone expert can suggest how to fix…
Thanks in advance

In phpmyadmin its called “routines” or “procedures” depending on your language settings.
In my german phpmcyadmin it looks like this


and can also be found from the top right menu

Would be really great if someone could further tweak these routines/procedures. I do feel we are very close to get something which could fix these stats issues which are really boring a lot.

My Power Smartmeter had one spike to 800.000 something kWh and now I am suffering on all stats /yearly /monthly /weekly and cannot get them display correctly again despite removing the high spike from all recorded statistics.

There needs to be a way to delete these spikes and re-run/fresh any dependant statistics which have been calculating on these spike values.

This is a severe issue to me as this might easily destroy your long running (yearly) statistics within a second.

I am sure that someone with proper sql knowledge could easily recalculate any sum based on the corrected data. anyone?

Hi, I want to use the procedure to correct wrong values in my db.
To you now how can I add the procedure in SqLite? I installed the Sqlite Addon but there seems to be no option to add a stored procedure.

BR
René

You need to install your database client (SqLite or any other) in your own PC and connect to the database using that tool. Remember to stop HA when you are doing database changes and backup your data!

I have regular full backups from HA to my NAS. Is this sufficient as a backup or do I need to backup the DB separately?

Is it ok to only stop the recorder service in HA? When I stop HA completely I think the DB will also be stopped so that I’m unable to connect to it from my pc.

I installed SQLite Studio but it seams that there is only a option to pick a file, not to estabilsh a connection.

Is it possible to connect the the HA DB from outside? Or do I have to copy the DB manually to my pc and later back to the host of HA?

So next question after downloading the DB from HA host and try to updating it with Sqlite Studio? Is it possible that alle the procedures in this thread belong to mysql DBs? The default is Sqlite, which, as i read, does not know the concept of stored procedures :frowning:

Frustrating that there is no easy way to fix wrong values in the statistics. :frowning:

Statistics fixing MIGHT be easy. Have a look at the /developer-tools/statistics page. Example:

That seems to fix tables statistics and statistics_short_term (but not table states).

1 Like

I had similar issues, a sensor going wild. So my watermeter data was flooded with liters not consumed many. I managed to remove via statistics the incorrect values for each watermeter sensor.
Now I am left with calculations which are not correct and still use the deleted data.

How do I get this corrected? I do not have any clue on what to do on the database as described above, way above my knowledge level.
Is there not an option for dummies? Goal of HA is to be simple, removing/adjusting incorrect values via statistics easy and straight forward, but then it stops to be simple and you need to be a DB engineer to understand what to do.
Hope someone has a simple and straight forward option to fix this.
Otherwise I will have to delete the watermeter and install it again :frowning:
Thanks for helping out.