To clarify, these are entities that were removed with the device. Old esphome devices mostly, which I test, experiment with. Their sensors submitted values that are still within the purge period, but the device is no longer present, and never will be. I keep 1 year data. When home assistant removes a device and its entities, it is only deleted from the local recorder, not from the long term database.
If anyone needs it, I ended up marking the unused entites in states_meta in a new column and run these:
SET FOREIGN_KEY_CHECKS=0;
DELETE FROM states WHERE metadata_id IN (SELECT metadata_id FROM states_meta WHERE orphaned = 1);
DELETE FROM state_attributes WHERE attributes_id NOT IN (SELECT DISTINCT attributes_id FROM states);
DELETE FROM states_meta WHERE orphaned = 1;
SET FOREIGN_KEY_CHECKS=1;
There you will find that for each of your deleted entities you will have a FIX ISSUE button / link. When you click it there will be a message about state data no longer being available and do you want to delete the statistics.
That’s exaclty the problem! Deleted entities that are only present in the long term database are not there. I had over 100,000 in the database, but only about 2000 listed there.
Yes, it took about an hour to run those sql commands. Esphome, ble device tracker, libre hardware monitor, and the ping integration recording every roundtrip as a different attribute (this was fixed yesterday) were the main culprits.
I mosly need the large purge time to see the yearly power consumption.
I’m on the other end of the spectrum. I keep my purge_days to a minimum and exclude any entities I don’t absolutely need, keeping my Recorder database as lean as possible. And I don’t use the Recorder database for LTS at all. Every single one of those tables is wasted space to me.
I’d argue that I need the purge option for LTS even more than someone who is already managing a large database. At one point the LTS data were taking up over 80% of my database. No, that’s not a typo. 80% wasted space. Voted.
Based on what I understood from another topic, it could be possible that LTS-data occupy more space than History:
Purge interval is a few days.
Only a few entities are included to Recorder.
There were lots of sensors with LTS stored in DB; then these sensors were excluded from Recorder - but these LTS-data are still there; and there is an “issue” in “Dev tools - Statistics» saying “these entities were excluded from Recorder” with no option to delete old data. (Update: this PR made possible to delete LTS for not-recorded entities)
So, if you do not need LTS - you have to disable “LTS-ability” by setting “state_class: none” for recorded sensors; and for not recorded sensors - see a PR above, a possibility to manually delete an old LTS was added.
I have not tested changes added by that PR, to be honest. It was said in this PR “This just replaces one issue with another (entity no longer recorded → entity not recorded), but at least it cleans up the DB.” - i.e. you can delete old LTS but you will still see a related issue.
But here the OP said that many DELETED entities still have LTS-data in DB, so I wonder how it can happen.
Don’t worry, I also run InfluxDB. And I run it on a Windows Server (not ESP).
I think I am mixing up the LTS with the mariadb database, by long term I mean the latter, not the averaged values beyond the purge interval. Anyhow, no matter how short it is, those orphaned values will not go away.
Maybe I don’t understand how the frontend works, but in that PR I can’t see anything that would delete records from a database. Is it maybe _deletedStatistics? statistics_* tables are not where the bulk of the data is.
I thought I was done doing DBA work when I retired. I’m getting to the point where I’m considering just deleting the database monthly. The whole idea of keeping what I would call “archival” data in the same database as real-time data is a problem right from the start. And the “one size fits all” retention period for all data is just poor design.
I already keep the most critical data (to me) elsewhere. I’m thinking about doing the same for the less critical stuff that I like to retain for different intervals.
I know a lot of work has gone into tweaking the database, and I appreciate that. But at some point it’s going to take more than tweaks to achieve a stable, long-term solution.
Update: I was able to prevent my entities from recording LTS data, and clearing these data from the database.
I posted more info on another thread, here.