Remove statistics of disabled (not deleted!) entities from recorder

Would be great to have this. Scenario:

  • I disabled a lot of entities which produced many thousand rows in the database (including statistics and statistics_short_term tables) which I want to remove now.
  • Note that I disabled them, they are not deleted. For reasons (e. g. to know the integration offers/offered this entity or maybe I decide to just re-enable the entity later).
  • Additionally I ran the recorder service “purge_entities” for the affected entities, but that still does not allow to delete the statistics using “fix it” button in dev-tools statistics section.
  • So all the corresponding statistics rows stay in the database, which are roughly 4.743 in statistics and 4.166 in statistics_short_term - per entity.

Overview of current situation (scenarios and options as known to me after digging into this for six hours) when it comes to database hygiene and size reduction, where this feature request aims at scenario #4:

  1. :white_check_mark: Delete the statistics of single deleted entities:
    simply use the “fix” button in the dev-tools statistics section

  2. :x: Delete the statistics of multiple deleted entities:
    not easily possible currently, see this feature request Implement a "Fix All" button in "Developer Tools > Statistics" to remove orphaned IDs · Discussion #12644 · home-assistant/frontend · GitHub plus this feature request Removing statistics of deleted devices in one click
    (it is possible by doing all of this in the backend which is quite risky: Statistics - Fix Issue - large amount of data - #3 by vinzcenzo)

  3. :white_check_mark: Delete the state and events of single entities:
    use purge_entities service of Recorder - Home Assistant

  4. :x: Delete the statistics of single or multiple disabled (not deleted!) entities:
    not easily possible currently, see proposed solution for this below
    (it is possible by doing all of this in the backend which is quite risky: Statistics - Fix Issue - large amount of data - #3 by vinzcenzo)


Proposed solution (depending on how easily this should be accessable to normal users):

  1. Either add option in dev-tools statistics section similar to the existing one but enable it for ALL statistic entities, not only the deleted ones.

  2. Or add a recorder service like the existing one Recorder - Home Assistant but for statistics tables instead of states and events, e. g. service recorder.purge_statistics.

I would be very happy with the service option in 2 already, even I would need to run that service for every single entity.

Addendum:

Running this (from @vinzcenzo / Statistics - Fix Issue - large amount of data) in your database gives you an overview of affected entities.

    SELECT
    	CASE WHEN entity_id IS NULL
    		THEN 'Orphaned'
    		ELSE 'OK'
    	END Issue,
    	entity_id,
    	statistic_id
    FROM	(
    		SELECT
    			b.entity_id,
    			a.statistic_id				
    		FROM
    			statistics_meta a
    		LEFT JOIN 'states' b
    			ON a.statistic_id = b.entity_id
    		GROUP BY
    			a.statistic_id -- we want to avoid duplicates
    		)
    ORDER BY Issue DESC;

and this gives you the number of affected entities:

SELECT COUNT (*) AS "Orphaned statistic entities"
FROM	(
		SELECT 
			b.entity_id, 
			a.statistic_id				
		FROM 
			statistics_meta a
		LEFT JOIN 'states' b 
			ON a.statistic_id = b.entity_id
		GROUP BY 
			a.statistic_id -- we want to avoid duplicates
		)
WHERE "entity_id" IS NULL;

grafik

Also kind of related to Disable long term statistics?! - #13 by erik3

TL;DR someone thinks this is not a problem, as there is still space on the disk.

:joy:

I read that topic and get what you mean. Funny to read as the team put quite some effort into reducing database sizes this year e. g. by creating states attributes tables etc… See release notes of I think beginning with 2022.3.

This is some pretty low hanging fruit as far as I can see - something that with a small amount of work reaps great rewards. As a developer this is the kind of thing I implement to throw the client a bone when they are waiting on larger more complex features

2 Likes

…nevertheless unfortunately nobody seems to be willing or able to do that small amout of work.

Meanwhile databases fill up with trash data. Too bad.

2 Likes

+1 here.

It bugs me a lot to have all of those statistics errors in there and pressing the fix button doing nothing actually about fixing them.

Upvote, tell others/spread the word and let them upvote this FR too.

Just one word of caution though: for most things throwing away LTS may not be a problem, and even be desirable. But this may also for instance be energy dashboard data if you ever changed which entities provide the data. So blindly removing all LTS from disabled/removed entities may not always be what you want. You could lose history there too.

Good point. Thoughts:

  1. People should know what they do. That‘s why a service (option 2 in the OP) would be a suitable solution, wouldn’t it. Pointing this to single entities gives a good amount of control in my opinion.
  1. I think it’s much safer than people starting to clean things up in the backend directly, shooting with SQL DELETE statements at their databases.

Still is still a big issue, right?
Unfortunately the SQL query (posted in 2022) does not work anymore (db changes?) but I’m pretty sure I have many pollution in this table (which is 1.9GB in size)

I disabled many entities, like signal strength, linkquality, voltage, of many battery powered zigbee devices (50+) x 3. Also secondary temperature measurements of motion sensors, door sensors, etc that didn;t make sense. I disabled 100’s of entities. But they are all still there in the LTS database…

Some other value I stopped recording (in stead of disable), that will trigger a fix to remove old values.
Disabling an entity does NOT trigger a fix.

WHY?

You can use this query:

SELECT s.metadata_id, sm2.statistic_id, COUNT(*) AS `#`, strftime('%d-%m-%Y %H:%M:%f', MIN(created_ts), 'unixepoch') AS first_ts, 
strftime('%d-%m-%Y %H:%M:%f', MAX(created_ts), 'unixepoch') AS last_ts
FROM "statistics" s
LEFT JOIN statistics_meta sm2 ON sm2.id = s.metadata_id
WHERE metadata_id IN (
SELECT id FROM statistics_meta sm1
WHERE sm1.statistic_id NOT IN (SELECT entity_id FROM states_meta)
)
GROUP BY s.metadata_id
ORDER BY MAX(created_ts) ASC

This lists all entities that have statistics, but have no entity history. This can be because they are not used anymore, or they have been excluded from recording.

I believe this FR is implemented now, you’ve been able to bulk delete any statistics of any entity in developer-tools for a few months now.

So doing SQL queries on your DB should not be necessary.

How exactly? In Dev-Tools section you might be able to mass delete statistics. But only the ones from entities actually shown. Are disabled / orphaned ones also visible? I doubt it. In case those are actually made visible from the backend/database in the UI meanwhile, I might be wrong.

Are disabled / orphaned ones also visible? I doubt it

I expect that they are, but report back if you find otherwise.