I disabled a lot of entities which produced many thousand rows in the database (including statistics and statistics_short_term tables) which I want to remove now.
Note that I disabled them, they are not deleted. For reasons (e. g. to know the integration offers/offered this entity or maybe I decide to just re-enable the entity later).
Additionally I ran the recorder service “purge_entities” for the affected entities, but that still does not allow to delete the statistics using “fix it” button in dev-tools statistics section.
So all the corresponding statistics rows stay in the database, which are roughly 4.743 in statistics and 4.166 in statistics_short_term - per entity.
Overview of current situation (scenarios and options as known to me after digging into this for six hours) when it comes to database hygiene and size reduction, where this feature request aims at scenario #4:
Delete the statistics of single deleted entities:
simply use the “fix” button in the dev-tools statistics section
Delete the state and events of single entities:
use purge_entities service of Recorder - Home Assistant
Delete the statistics of single or multiple disabled (not deleted!) entities:
not easily possible currently, see proposed solution for this below
(it is possible by doing all of this in the backend which is quite risky: Statistics - Fix Issue - large amount of data - #3 by vinzcenzo)
Proposed solution (depending on how easily this should be accessable to normal users):
Either add option in dev-tools statistics section similar to the existing one but enable it for ALL statistic entities, not only the deleted ones.
Or add a recorder service like the existing one Recorder - Home Assistant but for statistics tables instead of states and events, e. g. service recorder.purge_statistics.
I would be very happy with the service option in 2 already, even I would need to run that service for every single entity.
SELECT
CASE WHEN entity_id IS NULL
THEN 'Orphaned'
ELSE 'OK'
END Issue,
entity_id,
statistic_id
FROM (
SELECT
b.entity_id,
a.statistic_id
FROM
statistics_meta a
LEFT JOIN 'states' b
ON a.statistic_id = b.entity_id
GROUP BY
a.statistic_id -- we want to avoid duplicates
)
ORDER BY Issue DESC;
and this gives you the number of affected entities:
SELECT COUNT (*) AS "Orphaned statistic entities"
FROM (
SELECT
b.entity_id,
a.statistic_id
FROM
statistics_meta a
LEFT JOIN 'states' b
ON a.statistic_id = b.entity_id
GROUP BY
a.statistic_id -- we want to avoid duplicates
)
WHERE "entity_id" IS NULL;
I read that topic and get what you mean. Funny to read as the team put quite some effort into reducing database sizes this year e. g. by creating states attributes tables etc… See release notes of I think beginning with 2022.3.
This is some pretty low hanging fruit as far as I can see - something that with a small amount of work reaps great rewards. As a developer this is the kind of thing I implement to throw the client a bone when they are waiting on larger more complex features
Just one word of caution though: for most things throwing away LTS may not be a problem, and even be desirable. But this may also for instance be energy dashboard data if you ever changed which entities provide the data. So blindly removing all LTS from disabled/removed entities may not always be what you want. You could lose history there too.
People should know what they do. That‘s why a service (option 2 in the OP) would be a suitable solution, wouldn’t it. Pointing this to single entities gives a good amount of control in my opinion.
I think it’s much safer than people starting to clean things up in the backend directly, shooting with SQL DELETE statements at their databases.
Still is still a big issue, right?
Unfortunately the SQL query (posted in 2022) does not work anymore (db changes?) but I’m pretty sure I have many pollution in this table (which is 1.9GB in size)
I disabled many entities, like signal strength, linkquality, voltage, of many battery powered zigbee devices (50+) x 3. Also secondary temperature measurements of motion sensors, door sensors, etc that didn;t make sense. I disabled 100’s of entities. But they are all still there in the LTS database…
Some other value I stopped recording (in stead of disable), that will trigger a fix to remove old values.
Disabling an entity does NOT trigger a fix.
SELECT s.metadata_id, sm2.statistic_id, COUNT(*) AS `#`, strftime('%d-%m-%Y %H:%M:%f', MIN(created_ts), 'unixepoch') AS first_ts,
strftime('%d-%m-%Y %H:%M:%f', MAX(created_ts), 'unixepoch') AS last_ts
FROM "statistics" s
LEFT JOIN statistics_meta sm2 ON sm2.id = s.metadata_id
WHERE metadata_id IN (
SELECT id FROM statistics_meta sm1
WHERE sm1.statistic_id NOT IN (SELECT entity_id FROM states_meta)
)
GROUP BY s.metadata_id
ORDER BY MAX(created_ts) ASC
This lists all entities that have statistics, but have no entity history. This can be because they are not used anymore, or they have been excluded from recording.
How exactly? In Dev-Tools section you might be able to mass delete statistics. But only the ones from entities actually shown. Are disabled / orphaned ones also visible? I doubt it. In case those are actually made visible from the backend/database in the UI meanwhile, I might be wrong.