Very simple , I used “sum/state” for a table “min/mean/max” because the “max” column was missing in the restored data. Some process should then carry out a plausibility check and delete the data again. I’m now also noticing problems on individual days, as they say, the “sum/state” is complicated. There are values where the “sum” value doesn’t fit and I suddenly have -2000 kWh consumption in one hour. I haven’t figured that out yet. Somehow the difference doesn’t match any values:
z.B.:
It’s clear how the value comes about, but I still have to research how I fix it.
Addendum:
I don’t know if I understand this correctly, but I’ll write it down. There are 2 variants, values that are getting higher and higher, such as PV production since construction. “Sum” is always the difference from the first “state”. But then there are also values that regularly go to 0, PV daily production for example. Since “sum” is the sum of all increases. So it’s actually always the sum of all increases. What I don’t understand yet, when do I start with 0? If I import a value now, then there are already current values, if the “sum” of my imported values is already high, what happens if the current value is lower?
Has anyone come further than me?
At least I found that you can repair the values in the developer tools - statistics. I’ll take a look at what the results look like in the database starting this evening.
I’ll continue writing now, maybe it will help someone who also has partially corrupt data.
I’ve now played with correcting the values in the developer tools statistics:
If I correct a value like “2259,659 0.0” above, then the entire “sum” column is recalculated up to the current value and all values after that are OK. This is actually a quick and practical method.
I did not really understand much of this, but what I did for importing counters is, I used a negative sum value. This worked perfectly.
So, lets say the already existing HA data for PV production start at 01.01.2024 00:00, with state = 1000 and sum = 0 (from what I understood, state is the value of the sensor, and sum will always start with 0 at the beginning).
When I have a daily PV production of 10, the values at 31.12.2023 are state = 990 and sum = -10, and so on. The difference between two time stamps should always be identical for state and sum. This is totally OK in your first table.
I did not understand how your error happens. What did you import, and what was the result? If you import your second table at Feb, 11th, then it is clear that you get a -2167.99. between 19:00 and 20:00. Or was this a wrong export of your “old” database you want to correct? If this is true, you should also be able to recalculate the sum column in e.g. excel and import it afterwards. But good to know that repair values works in this way.
Not sure if that helped
Maybe I find some time to explain the counter import with more details in the readme.
I’m wondering if it’s mandatory to include all hours. I have exports from my energy supplier from years back that I would like to import. However, I only have the values per day. Can I just leave the other values empty? Or do I need to specify all hours for every day and only do the growth at midnight for example?
You should enter the values every hour, otherwise you will always have lines and no connection between them. At least that’s how it is for me when data is missing in between.
For me the reason is that I had a defective SQLite twice and some of the data was corrupted, hence these strange “sum” values. At the time of import, I did not yet understand the logic of the values, and I have now simply recorded the understanding process here. In my case, correcting the values is probably better after the import, since I already had last week’s data in the database and had to adjust the values anyway, otherwise there would be large negative jumps when changing from old to new.
Sorry if it looked like it was a question, I just wanted to contribute my findings on the topic, unfortunately I can’t program.
That would not be a huge issue for me. I don’t have the data per hour, so there is also no need to be able to see it per hour. Seeing it per day is enough.
What would you suggest? just increase once per day and keep all hours equal?
It’s a matter of taste whether you want to see a staircase or a ramp. But if the original data is daily, an increase at midnight would be the most honest representation
I also had only daily data, and added hourly values, with only one change per day (the rest of the hourly values are identical). But you can also start with the other approach, if its fine for you, its fine. If not, do another import with the changed approach.
Unfortunately I have to correct myself, my statement that correcting an early “sum” value corrects all values afterwards is not true. It worked once, but I think there was just one mistake. Today I had sensors with several errors, so each one needs to be corrected. It is better to correct the data before or with a routine during the import.
So my situation is a bit similar as described above. I have 3 years of energy/gas data in HA and I have the remaining values in an excel sheet. So obviously the sum starts at 0 when the sensor was initialised in HA 3 years ago. I have created a sheet where I manually calculated a coupe of weeks before the 3 years of data. So the SUM gets negative. Similar to the example TSV provided. But the problem is that the oldest value than has a SUM that negative. And therefor the first value in the energy dashboard is also negative.
One solution I was thinking about is to extract all data from the specific sensor from the statistics table and remove all entries from the database. Then merge the HA data and the offline data in a new table and recalculate all the SUM values. Then use your script to re-import everything again. I guess it should work, but I’m not sure what this does with the costs that are also maintained somewhere. But I guess it should work as the costs seem to be in another sensor that is not connected directly to the statistics.
Maybe a workaround: Just import e.g. one more year without changing the values, so that you have zero every hour. Then the negative value is so much in the past that it does not hurt anymore, because you never look at it?
I’m currently working on recalculating the sum column using the SQL procedure, in pgAdmin4. I also have a script that does this for PostgreSQL, but I haven’t tested it in the productive environment yet. If I get around to it at the weekend and it works, I’ll be happy to share it here. But it is made specifically for pgAdmin and will not work with SQLite without adjustments.
Procedure for correcting the “sum” column in the “statistics” table
The procedure is ready and tested. It corrects the entire “sum” column of the “statistics” table for the specified entity. It could also be used for the “statistics_short_term” table, but you would need to replace all instances of “statistics” with “statistics_short_term”. However, I simply deleted this table because it will be rebuilt anyway.
Here is the procedure:
CREATE OR REPLACE FUNCTION fix_statistics_sum(entity_id VARCHAR(255))
RETURNS VOID
AS $$
DECLARE
curr_id INT;
prev_state FLOAT := 0;
curr_state FLOAT;
curr_sum FLOAT;
sum_diff FLOAT;
changed_rows INT := 0;
first_value FLOAT;
last_value FLOAT;
curr_row RECORD;
BEGIN
-- Hole die erste Summe
SELECT sum INTO curr_sum
FROM statistics
WHERE metadata_id = (SELECT id FROM statistics_meta WHERE statistic_id = entity_id)
ORDER BY start_ts
LIMIT 1;
-- Öffne einen Cursor, um die Daten Zeile für Zeile zu durchlaufen
FOR curr_row IN
SELECT id, state
FROM statistics
WHERE metadata_id = (SELECT id FROM statistics_meta WHERE statistic_id = entity_id)
ORDER BY start_ts
LOOP
curr_id := curr_row.id;
curr_state := curr_row.state;
-- Berechne die Differenz zum vorherigen Zustand (state)
sum_diff := curr_state - prev_state;
-- Aktualisiere die Summe nur, wenn der Zustand (state) größer oder gleich dem vorherigen Wert ist
IF sum_diff >= 0 THEN
-- Aktualisiere die Summe um die Differenz
curr_sum := curr_sum + sum_diff;
changed_rows := changed_rows + 1;
END IF;
-- Setze die Summe für die aktuelle Zeile
UPDATE statistics
SET sum = curr_sum
WHERE id = curr_id;
-- Setze den vorherigen Zustand für die nächste Iteration
prev_state := curr_state;
-- Speichere den ersten und letzten Wert
IF changed_rows = 1 THEN
first_value := curr_sum;
END IF;
last_value := curr_sum;
END LOOP;
-- Gib die Anzahl der geänderten Zeilen und die ersten und letzten Werte aus
RAISE NOTICE 'Anzahl der geänderten Zeilen: %', changed_rows;
RAISE NOTICE 'Erster Wert: %, Letzter Wert: %', first_value, last_value;
END;
$$ LANGUAGE plpgsql;
The procedure is written for Postgresql and tested with pgAdmin4. It will not work with Sqlite without modifications.
What to do:
Open the database in pgAdmin.
Open the “statistics” table with View Edit Data.
Copy the procedure into the “Query” window and execute it with “Execute Script”.
The procedure can now be found under “Functions”.
Delete the contents of Query and insert the start command(s), of course with the desired sensor.
At the latest now, exit Home Assistant, otherwise there will be a statistics error at the time of insertion.
Execute the procedure with “Execute Script”, multiple lines can also be executed at once.
Under “Messages” it is written out how many records were processed and the first and last “sum” value.
Optionally, delete the “statistics_short_term” table.
Restart HA.
Notes:
The procedure must be executed for each entity separately.
The start command must be adapted for each sensor.
The procedure can be used to correct errors in the “sum” column.
The procedure can also be used to update the “sum” column after adding new data.
After one hour, I had huge jumps in the consumption data, at the time when the imported data met the new data. I was not able to adjust the procedure so that it calculates the sum column back from the current value. However, I was able to easily fix this using the statistics correction in the developer tools.
Klaus thanks so much for this great integration, I was able to import all my old power usage. I ran into a roadblock for which I wanted to share my solution to hopefully save someone else a bit of head scratching. I was getting an “Invalid timestamp” error during importing. I checked my timestamp formatting, tried different formats, tried changing timezone, etc, all of which were fine. Ultimately I discovered that Home Assistant statistics will only accept timestamps at the top of the hour (eg, 01:00, 02:00, 03:00, etc), but my power company provides data every 15 minutes (eg 01:00, 01:15, 01:30, etc). Home Assistant core recorder statistics will throw an error if the minutes is set to anything but 0.
So all I had to do is modify my data to add up the total for each hour, then it imported perfectly!
Hello Klaus,
if my energy data, which i want to import, already contains the timestamp format as date, how can i use it, without converting it to date format ?
Hi @TTr0n
sorry, I do not understand your question. The timestamp must be available, it must be an exact hour (no minutes, no seconds), and it must be formatted in a ways that python/pandas understands it. When you use the timestamp as in the example files, you are on a safe path
Thx for the answer.
For using this, i need hour based data, but what to do, if i have some hours without any data? Do I really have to add every single hour from 0-23?
What about the missing hours? Do I have to fill them in manually?
And second question:
Your integration puts the data into statistics table and there i need a sum value, because its a counter value with state/sum.
Whats the best practice in m case? Delete the last 4 days in statistics table and generate this data new with the data from statistics_short_term combined with the old data from 2023 till 13.03. ? Or does it matter and my old data can also start by zero ?