Funnily during several “events” (accidents / incidents) I had to correct data in the database many times meanwhile - and I’ve ended up using almost the same steps.
But this topic is a good description of the necessary tasks.
I just want to add:
Meanwhile DB schema has changed. Sometimes it might be necessary to take a look at the “new” table state_attributes too (usually not for fixing statistics data but likely for other purposes).
To fix things 100 % I always had to edit (in that order):
states table
statistics_short_term table
statistics table
Take caution when using date filters: the timestamps in the database are UTC, this might/will vary to your actual time (+/- X hours depending on your timezone).
I learned to usually never delete rows from any table, as this often leads to FK constraint violation which immediately renders the database inconsistent ending in an automatically created new one with zero content. So while DELETE is very dangeorous to use, UPDATE statements with still knowing exactly what you’re doing seem to be more safe.
For deleting (a lot of) orphaned statistics here’s a great how-to guide:
Also have run into this problem with suddenly incorrect data in database, now its solved after hunting wrong entry in Developer Tools → STATISTIC. But why this happens and how can we avoid it?
Why are not all entities available in Developer Tools → STATISTIC? For example, I have an entity with outdoor temperature, there are some errors and I can’t edit it
Due to a wrong cost of the energy price that I had I have manually deleted the cost data from statistics and statistics_short_term tables using the right metadata_id. Now the it started to populöate the tables with new data and correct cost. But I wonder if it is possible to re-generate the data for the past X number of days? How?
This topic is about fixing statistics data, not about discussing database structure observations. I’m very sure you’ll find another topic where this question is more suitable or even has been answered.
I don’t see a readme explaining on how to use it. Also no limitations documented - in the first lines I see you import MySQLdb, so is this script only working for MySQL databases? Etc. etc. Maybe you wand to add some more notes so this script can be more helpful to more people, understanding better on how to use it and for which setups.
Hello,
just as an info for others seeking solution for broken recorder statistics.
I did found dupondje homeassistant-fix-recordings script and improve it to fix recoder sum data (in the table statistics and statistic_short_term). It now supports SQLite as a default HomeAssistant database. I was able to use it to fix my broken statistics after modbus sensors went haywire.
even if I was not able to find other documentation for this functionality. This helped me fixing my energy statistic without messing around with the database myself!
I have clean history for my power sensors with class measurement, but not for my energy sensors with class total_increasing.
Please correct me if I am wrong, but exporting history from HA give us one random state change per hour for long term statistics (LTS), not the mean of the power sensors per hour. If you have a need to rebuild energy consumption sensor history (kWh) from a power sensor’s (W) LTS, this won’t work. You need the hourly mean (W) from the power sensor so that you can then rebuild the (kWh) as a total_increasing sensor.
In order to get that min/max/mean data for the state class “measurement”, I believe it will need to be extracted with SQL directly from the DB.
Since my energy history is all over the place, I want to extract all the LTS for the power sensors, and rebuild the energy total_increasing statistics.
All of my power sensors that I would like to rebuild the corresponding energy sensor statistics for start with emporiavue2_ and end with _power, and their corresponding energy sensors have the same names, but end with _daily_energy instead of _power.
I have recently posted my own how-to guide to manipulating long term statistics in HA. Scenario 7 should contain the sort of SQL you are looking for as well as extensive guides as to how I have used the HA statistics integration to load LTS into HA from different sources. I used Google Sheets to reformat CSV data into the required formats