Recalculate statistics table sum to fix wrong energy data

This was a great help for me, thanks.
My cost calculation started to accelerate to ludicrous amounts after I migrated my data to MariaDB.

Just be cautious - the “energy cost” (not “energy”) statistics can have the “last_reset” column filled, and on the rows that this datetime changes from the previous row, the “prev_state” must not be subtracted from “state”. So have a look at the rows for your metadata_id, and see if it has “last_reset” set.

In this case the procedure must be altered to take care of this. I did this in the ugliest way possible because I didn’t want to spend more time on it, here it is in case you want to improve it:

CREATE DEFINER=`homeassistant`@`%` PROCEDURE `fix_short_term`(IN `metadata` INT, IN `from_date` DATETIME)
BEGIN
  DECLARE done INT DEFAULT FALSE;
  DECLARE curr_id INT;
  DECLARE reset_date DATETIME;
  DECLARE prev_reset_date DATETIME;
  DECLARE cur1 CURSOR FOR SELECT id, last_reset FROM statistics_short_term where metadata_id = metadata and created >= from_date;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

  OPEN cur1;
  FETCH cur1 INTO curr_id, reset_date;
  SET prev_reset_date = reset_date;

  read_loop: LOOP
	
    IF done THEN
      LEAVE read_loop;
    END IF;

    IF reset_date is null OR reset_date = prev_reset_date THEN
        update statistics_short_term as s1 inner join (
            select s1.id, s2.sum as prev_sum, s2.state as prev_state from statistics_short_term as s1
            inner join statistics_short_term as s2 on s2.id = (
                select id from statistics_short_term s2 where s1.metadata_id = s2.metadata_id and s2.created >= from_date and s2.id < curr_id order by id desc limit 1)
            where s1.id = curr_id)
	    as prev_table on s1.id = prev_table.id
	    set s1.sum = round(prev_sum + state - prev_state, 3);
    ELSE
        update statistics_short_term as s1 inner join (
            select s1.id, s2.sum as prev_sum, s2.state as prev_state from statistics_short_term as s1
            inner join statistics_short_term as s2 on s2.id = (
                select id from statistics_short_term s2 where s1.metadata_id = s2.metadata_id and s2.created >= from_date and s2.id < curr_id order by id desc limit 1)
            where s1.id = curr_id)
        as prev_table on s1.id = prev_table.id
        set s1.sum = round(prev_sum + state, 3);
			
        SET prev_reset_date = reset_date;
    END IF;
    
    FETCH cur1 INTO curr_id, reset_date;

  END LOOP;

  CLOSE cur1;
END
1 Like

I’m not familiar with stored procedures. How do I create it in my DB?

When I try to execute the above code I get the error:

**SQL query:**

CREATE DEFINER=`homeassisant`@`%` PROCEDURE `fix_statistics_sum`(IN `entity_id` VARCHAR(255))
BEGIN
	DECLARE curr_id INT

**MySQL said:** [![Documentation](http://192.168.0.183/phpmyadmin/themes/dot.gif "Documentation")](http://192.168.0.183/phpmyadmin/url.php?url=https%3A%2F%2Fdev.mysql.com%2Fdoc%2Frefman%2F5.7%2Fen%2Ferror-messages-server.html)

`#1064 - You have an error in your SQL syntax; check the manual 
that corresponds to your MariaDB server version for the right syntax to 
use near '' at line 3`

I guess I can’t just copy paste it. How do I need to create it?

In my GUI (MySQL Workbench), there’s a section below Tables and Views called Stored Procedures. I right-click Stored Procedures and select Create Stored Procedure, and go from there. It should be possible to copy-paste it then.

Thanks.

I used Phpmyadmin, but I couldn’t figure it out there.
In MySQL Workbench it works as you describe indeed.

Hi, I stumpled across this guide whilst trying to fix an issue with my Energy dashboard where my Gas meter sensor wasn’t being picked up as a source, one suggestion was to change the unit_of_measurement: from “m3” to “m³” and that I should flush the statistics history to make sure it works properly. I knew nothing about the database used in HA and so began the learning curve.

My HA is a HA OS on Raspberry Pi which is nothing special, but just to set the context, and that it uses SQLite3 which again I’d never worked with before.
So to start with I connected to the Raspberry Pi using SSH and stopped the HA core
➜ ~ ha core stop

Next I downloaded the SQLite Linux binareis from SQLite Download Page
wget https://www.sqlite.org/2022/sqlite-tools-linux-x86-3390400.zip

Then I decompressed it using the unzip command
➜ ~ unzip https://www.sqlite.org/2022/sqlite-tools-linux-x86-3390400.zip

This then creates a directory called ./sqlite-tools-linux-x86-3390400 which isn’t great so I renamed it:
➜ ~ mv sqlite-tools-linux-x86-3390400 sqlite-tools

next I confirmed the location of my SQLite database as being in ./config and executed sqlite3 using the path to the DB
➜ ~ sqlite3 ../config/home-assistant_v2.db

I confirmed I was in the right DB and not a newly created on by checking for existing tables:

sqlite> .tables
event_data             state_attributes       statistics_runs
events                 states                 statistics_short_term
recorder_runs          statistics
schema_changes         statistics_meta

Then I needed to find the sensor ID I need to clear the stats for:

sqlite> SELECT * FROM "statistics_meta" WHERE "statistic_id" = "sensor.gas_meter";
37|sensor.gas_meter|recorder|m³|0|1|

With the ID of the sensor I could now clear the statistics data:

sqlite> DELETE FROM "statistics" WHERE "metadata_id" = "37";
sqlite> DELETE FROM "statistics_short_term" WHERE "metadata_id" = "37";
sqlite> .exit

Now with the stats data cleared I restarted the HA core:
➜ ~ ha core start

And lastly as there was a update to Home Assistant 2022.11.0 I installed the update and my sensor became visible in the energy dashboard with the correct unit of measurement and I was plotting my Gas consumption.

I hope this helps somebody else who needs to go through this process.

3 Likes

Any way to do this in PHPmyadmin? I’ve tried with MySQL workbench but it’s not compatible with mariadb and eventually ended up corrupting my DB

Can someone take a look at my post at Issue with Energy Database - Energy - Home Assistant Community (home-assistant.io)?
I have issue with Energy data … that maybe someone expert can suggest how to fix…
Thanks in advance

In phpmyadmin its called “routines” or “procedures” depending on your language settings.
In my german phpmcyadmin it looks like this


and can also be found from the top right menu

Would be really great if someone could further tweak these routines/procedures. I do feel we are very close to get something which could fix these stats issues which are really boring a lot.

My Power Smartmeter had one spike to 800.000 something kWh and now I am suffering on all stats /yearly /monthly /weekly and cannot get them display correctly again despite removing the high spike from all recorded statistics.

There needs to be a way to delete these spikes and re-run/fresh any dependant statistics which have been calculating on these spike values.

This is a severe issue to me as this might easily destroy your long running (yearly) statistics within a second.

I am sure that someone with proper sql knowledge could easily recalculate any sum based on the corrected data. anyone?

Hi, I want to use the procedure to correct wrong values in my db.
To you now how can I add the procedure in SqLite? I installed the Sqlite Addon but there seems to be no option to add a stored procedure.

BR
René

You need to install your database client (SqLite or any other) in your own PC and connect to the database using that tool. Remember to stop HA when you are doing database changes and backup your data!

I have regular full backups from HA to my NAS. Is this sufficient as a backup or do I need to backup the DB separately?

Is it ok to only stop the recorder service in HA? When I stop HA completely I think the DB will also be stopped so that I’m unable to connect to it from my pc.

I installed SQLite Studio but it seams that there is only a option to pick a file, not to estabilsh a connection.

Is it possible to connect the the HA DB from outside? Or do I have to copy the DB manually to my pc and later back to the host of HA?

So next question after downloading the DB from HA host and try to updating it with Sqlite Studio? Is it possible that alle the procedures in this thread belong to mysql DBs? The default is Sqlite, which, as i read, does not know the concept of stored procedures :frowning:

Frustrating that there is no easy way to fix wrong values in the statistics. :frowning:

Statistics fixing MIGHT be easy. Have a look at the /developer-tools/statistics page. Example:

That seems to fix tables statistics and statistics_short_term (but not table states).

1 Like

I had similar issues, a sensor going wild. So my watermeter data was flooded with liters not consumed many. I managed to remove via statistics the incorrect values for each watermeter sensor.
Now I am left with calculations which are not correct and still use the deleted data.

How do I get this corrected? I do not have any clue on what to do on the database as described above, way above my knowledge level.
Is there not an option for dummies? Goal of HA is to be simple, removing/adjusting incorrect values via statistics easy and straight forward, but then it stops to be simple and you need to be a DB engineer to understand what to do.
Hope someone has a simple and straight forward option to fix this.
Otherwise I will have to delete the watermeter and install it again :frowning:
Thanks for helping out.

Did you solve this meanwhile?

I’ve tryed to change hte value but sum values after this value was not recalculated.

If you are on SQLite then it might be a bit complicated. Here are example SQLs that need manual adjustments - entity_id, date range and start value for recalculation in the last query:

-- database: c:\tmp\home-assistant_v2.db

--backup
CREATE TABLE IF NOT EXISTS statistics_backup AS 
SELECT * FROM statistics;

-- Restore from backup
DROP TABLE IF EXISTS statistics;
CREATE TABLE IF NOT EXISTS statistics AS
SELECT * FROM statistics_backup;

-- Use this query to see results
-- Select date range is 1 hour before the update range
-- In the first row you will see the initial 'sum' value and can use it instead of 1000 in the step 2.
SELECT id, state, sum, datetime(start_ts, 'unixepoch') as readable_time
FROM "statistics"
WHERE 
metadata_id = 172
AND datetime(start_ts, 'unixepoch') >= '2024-09-30 22:00:00'
AND datetime(start_ts, 'unixepoch') <= '2024-10-17 12:59:59'
ORDER BY start_ts;

UPDATE statistics
SET sum = 0
WHERE
metadata_id = 172
AND datetime(start_ts, 'unixepoch') >= '2024-10-01 00:00:00'
AND datetime(start_ts, 'unixepoch') <= '2024-10-17 12:59:59';


-- Step 1: Calculate sum deltas
WITH ordered_statistics AS (
    SELECT id, state, start_ts,
           LAG(state, 1, 0) OVER (ORDER BY id) AS prev_state,
           LAG(sum, 1, 0) OVER (ORDER BY id) AS prev_sum
    FROM statistics
    WHERE metadata_id = 172
    AND datetime(start_ts, 'unixepoch') >= '2024-10-01 00:00:00'
    AND datetime(start_ts, 'unixepoch') <= '2024-10-19 23:59:59'
)
UPDATE statistics
SET sum = ROUND(
    (SELECT CASE
        -- If current state is lower than previous (reset occurred)
        WHEN ordered_statistics.state < ordered_statistics.prev_state THEN ordered_statistics.prev_sum + ordered_statistics.state
        -- Normal cumulative sum calculation
        ELSE ordered_statistics.prev_sum + (ordered_statistics.state - ordered_statistics.prev_state)
    END
    FROM ordered_statistics
    WHERE ordered_statistics.id = statistics.id), 3)
WHERE metadata_id = 172
AND datetime(start_ts, 'unixepoch') >= '2024-10-01 00:00:00'
AND datetime(start_ts, 'unixepoch') <= '2024-10-19 23:59:59';


-- Step 2: Convert sum deltas into incremental value
WITH ordered_statistics AS (
    SELECT id, state, created_ts,
           LAG(state, 1, 2230.060000000319) OVER (ORDER BY id) AS prev_state,
           LAG(sum, 1, 2230.060000000319) OVER (ORDER BY id) AS prev_sum -- Start from the sum of the first row outside the range
    FROM statistics
    WHERE metadata_id = 172
    AND datetime(created_ts, 'unixepoch') >= '2024-10-01 00:00:00'
    AND datetime(created_ts, 'unixepoch') <= '2024-10-19 23:59:59'
),
deltas AS (
    SELECT id, 
           CASE
               -- If the current state is less than the previous state, it indicates a reset
               WHEN state < prev_state THEN state
               -- Otherwise, calculate the delta as the difference between the current and previous state
               ELSE state - prev_state
           END AS state_delta,
           prev_sum
    FROM ordered_statistics
),
cumulative AS (
    SELECT id, 
           -- Only add the hardcoded value to the first record in the range
           CASE 
               WHEN id = (SELECT MIN(id) FROM ordered_statistics) THEN state_delta + prev_sum
               ELSE SUM(state_delta) OVER (ORDER BY id) + 2230.060000000319
           END AS cumulative_sum
    FROM deltas
)
UPDATE statistics
SET sum = ROUND(cumulative.cumulative_sum, 3) -- Update the sum column with the calculated cumulative sum
FROM cumulative
WHERE statistics.id = cumulative.id
  AND metadata_id = 172
  AND datetime(created_ts, 'unixepoch') >= '2024-10-01 00:00:00'
  AND datetime(created_ts, 'unixepoch') <= '2024-10-19 23:59:59';

In his SQLs following values are hardcoded:

Replace this hardcoded valus in the SQLs:
2230.060000000319 - is sum value that should be used as a start
2024-10-01 00:00:00 - start date time
2024-10-19 23:59:59 - end date time, can be in future
172 - metadate_id of your sensor.

I know as I also have one incident of the earlier past where I experienced this.

I was hoping the SQL hacking „on the open heart“ wouldn’t be necessary anymore. Unfortunately it seems like it is though.

No idea what the dev tools section statistics fix does. It probably just ignores the sum column…for whatever reason -.-

1 Like

No did not manage to get the peak out. In the dbase I removed all but it kept on coming back. I just have to live with it.