How to fix statistics data (e.g. energy data)

Funnily during several “events” (accidents / incidents) I had to correct data in the database many times meanwhile - and I’ve ended up using almost the same steps.

But this topic is a good description of the necessary tasks.

I just want to add:

  1. Meanwhile DB schema has changed. Sometimes it might be necessary to take a look at the “new” table state_attributes too (usually not for fixing statistics data but likely for other purposes).
  2. To fix things 100 % I always had to edit (in that order):
  • states table
  • statistics_short_term table
  • statistics table
  1. Take caution when using date filters: the timestamps in the database are UTC, this might/will vary to your actual time (+/- X hours depending on your timezone).
  2. I learned to usually never delete rows from any table, as this often leads to FK constraint violation which immediately renders the database inconsistent ending in an automatically created new one with zero content. So while DELETE is very dangeorous to use, UPDATE statements with still knowing exactly what you’re doing seem to be more safe.
  3. For deleting (a lot of) orphaned statistics here’s a great how-to guide:

https://community.home-assistant.io/t/statistics-fix-issue-large-amount-of-data/419728

3 Likes

Hi,

Also have run into this problem with suddenly incorrect data in database, now its solved after hunting wrong entry in Developer Tools → STATISTIC. But why this happens and how can we avoid it?

BR
Patric

1 Like

Why are not all entities available in Developer Tools → STATISTIC? For example, I have an entity with outdoor temperature, there are some errors and I can’t edit it :frowning:

Does anybody know how to export hourly kWh usage? I extracted the sqlite tables but can’t get it to add up the way as the energy dashboard.

For example
Date,Usage
2022-12-01 00:00,0.29
2022-12-01 01:00,0.66

Due to a wrong cost of the energy price that I had I have manually deleted the cost data from statistics and statistics_short_term tables using the right metadata_id. Now the it started to populöate the tables with new data and correct cost. But I wonder if it is possible to re-generate the data for the past X number of days? How?

Anyone know why there are no created date in the data ?

select id, created, state, sum from statistics
where
metadata_id == 172

This topic is about fixing statistics data, not about discussing database structure observations. I’m very sure you’ll find another topic where this question is more suitable or even has been answered.

Well I have to fix my database and I am not able to use these fixes because of that empty data.

Hi, can you help me solve this?

Screenshot 2023-09-26 at 10-37-32 SQLite Web – Home Assistant

This post is really helpful to understand the way the statistics work. I see a new button in the correction dialog

I am using the german UI. in english, this could mean something like “outlier values”. This button seems to select values that are somehow unusual.

Does anybody know, if the function is explained in detail somewhere?

1 Like

I created some script to recalculate the sum values:

I don’t see a readme explaining on how to use it. Also no limitations documented - in the first lines I see you import MySQLdb, so is this script only working for MySQL databases? Etc. etc. Maybe you wand to add some more notes so this script can be more helpful to more people, understanding better on how to use it and for which setups.

I have done much of this to remove faulty data, but the data persists in any graph. How can I reset the data the graph has cached somewhere?

Data changed in statistics and statistics_short_term tables, but graph still shows incorrect data set.

Hello,
just as an info for others seeking solution for broken recorder statistics.

I did found dupondje homeassistant-fix-recordings script and improve it to fix recoder sum data (in the table statistics and statistic_short_term). It now supports SQLite as a default HomeAssistant database. I was able to use it to fix my broken statistics after modbus sensors went haywire.

1 Like

Thanks klacol,

even if I was not able to find other documentation for this functionality. This helped me fixing my energy statistic without messing around with the database myself!

Can you elaborate a bit more how it works and what kind of false data it fixes?

  1. I would propose you to read readme on the github repo.
  2. If there is something more specific you do not understand, ask specific question.
  3. There is a lot of topic on corrupted statistic data, read them as well.
  4. If you do not understand the topic at all do not use the script.

Great attitude, thanks :+1:

I have clean history for my power sensors with class measurement, but not for my energy sensors with class total_increasing.

Please correct me if I am wrong, but exporting history from HA give us one random state change per hour for long term statistics (LTS), not the mean of the power sensors per hour. If you have a need to rebuild energy consumption sensor history (kWh) from a power sensor’s (W) LTS, this won’t work. You need the hourly mean (W) from the power sensor so that you can then rebuild the (kWh) as a total_increasing sensor.

In order to get that min/max/mean data for the state class “measurement”, I believe it will need to be extracted with SQL directly from the DB.

Since my energy history is all over the place, I want to extract all the LTS for the power sensors, and rebuild the energy total_increasing statistics.

All of my power sensors that I would like to rebuild the corresponding energy sensor statistics for start with emporiavue2_ and end with _power, and their corresponding energy sensors have the same names, but end with _daily_energy instead of _power.

Any SQL experts in this thread that could suggest some SQL to extract the min/max/mean values for the _power sensors? I can then use pandas to convert that data into a csv/tsv file for use with GitHub - klausj1/homeassistant-statistics: This HA integration allows to import long term statistics from a file like csv or tsv which will rebuild the LTS for the energy sensors.

I have recently posted my own how-to guide to manipulating long term statistics in HA. Scenario 7 should contain the sort of SQL you are looking for as well as extensive guides as to how I have used the HA statistics integration to load LTS into HA from different sources. I used Google Sheets to reformat CSV data into the required formats