Automation to purge recorder data

I saw the Example automation to remove data rows for specific entities documention and started with this small example:

alias: Remove old CPU usage data
description: ""
triggers:
  - trigger: time
    at: "18:38:00"
conditions: []
actions:
  - action: recorder.purge_entities
    metadata: {}
    data:
      keep_days: 7
      entity_id:
        - sensor.processor_use_percent
mode: single

I let the automation trigger by time and started the automation manually some minutes later. When I afterwards opened the history (Verlauf) for this entity, I still see the data older than 7 days:

What am I doing wrong?

“Mo” - Monday? (27.10)
“So” - Sunday? (02.11)
It is 7 day, not more. Also, a History page may show an “augmented history” based on statistical data, but here you specified a period = 7 day (same as “keep_days”).

Yes, it is 7 days but it’s older than 7 days and shouldn’t be stored in the database anymore.

But meanwhile the older data is gone and only mean values - looks like one per hour - are shown in the history. So for others trying the same: Be patient; it takes some minutes to clean up :wink:

Although, that isn’t so much data:
How would I remove/purge those older data, too?

I see, you mean “today is 08.11, and here I see old data”.
Here my data for this year, I have purge interval 3 months.


A color for statistical data is slightly less vivid, can you select a span = 2 weeks & then compare colors?

You mean statistics? You cannot remove them by native HA tools, but you can remove them by corr. sql queries. Perhaps someone with sql expertise can chime in…

Maybe :wink:
If older data displayed in history view (see screenshot above) is taken from statistics, yes.


Another question in this area:
How can I see, which sensors are producing the biggest amount of data?
I’ve found the dbstats addon, but when running HA in docker on Linux at RPi, it does not work because there is no image for this environment. The only way is with installation natively on the Pi itself, which isn’t a good choice when going with docker for all applications.

I used DB browser for SQLite, opened a *.db file of HA database (accessible via SMB; also running HA in docker). Then I used SQL queries found here. I am not a DB expert, so it was like try/repeat. Also, that mentioned thread has several different queries posted, may be you need to use the latest one, no idea(((

1 Like

If you don’t want to retain long-term statistics for certain entities, you can set them to state_class: none in customize.yaml. Long-term statistics are only kept for a few select state_classes.

1 Like

Thanks for the hint, but I think, I will stay on the project and concentrate on biggest sensors.

I followed this hint and used DBeaver with this query:

SELECT
COUNT(*) AS cnt,
COUNT(*) * 100 / (SELECT COUNT(*) FROM states) AS cnt_pct,
states_meta.entity_id
FROM states
INNER JOIN states_meta ON states.metadata_id=states_meta.metadata_id
GROUP BY states_meta.entity_id
ORDER BY cnt DESC

Than I purged the first 15 sensors - mostly docker or system CPU or memory usage - which reduced the number of states by nearly 50% (from about more than 6 mio to about 3,2 mio).

Meanwhile state_attributes is in a similar size than states.

Does someone have a query to show biggest state_attributes?

Here’s one:

SELECT 
  COUNT(state_id) AS cnt, 
   SUM(
    LENGTH(state_attributes.shared_attrs)
  ) AS bytes, 
  states_meta.entity_id,states.attributes_id,state_attributes.shared_attrs
FROM 
  states 
LEFT JOIN state_attributes ON (states.attributes_id=state_attributes.attributes_id)
LEFT JOIN states_meta ON (states.metadata_id=states_meta.metadata_id)
GROUP BY 
  states.metadata_id 
ORDER BY 
  bytes DESC
2 Likes

Whow. Thanks!
Ran it and - as assumed - weather data is the biggest. In my case a sensor for the UV Index sensor.koeln_botan_gart_sonneneinstrahlung, which seems to have a time series as attribute. 4 biggest sensors use more that half of the state_attributes size.

I think there was an effort recently to take dynamic data out of the “attributes,” at least for some entities. Not sure of the details, but you’ve identified the problem. Attributes (IMHO) should be static data. Things like friendly name, unit of measure, etc. Dynamic values don’t belong there.

The thing to do is ask yourself: Do you really need to keep long-term statistics on the offending entities? If not, change the state_class to “none” and be done with it.

The HA recorder isn’t all that customizable. You can’t set keep_days per entity, and you can’t decide which entities to exclude from long-term statistics without changing their state class. I’ve found sometimes I’m better off just sending the data I want to track to a text file using the files integration and store/analyze it outside of HA.

For sensors containing many dynamically changed attributes, I usually do as follows:

  1. Create template sensors for those attributes (and may be a state) which I want to have in history.
  2. Exclude the “master” sensor from Recorder.
1 Like