Is it normal entity purging (and repacking) does not affect tables "statistics_short_term" (and related)? Also how to reduce size of "events" table?

ncd · June 29, 2024, 5:54pm

Answer from @tom_l

“Short term statistics (5 minute data) are down-sampled to hourly LTS (long term statistics) after your purge_keep_days interval setting.” [emphasis mine]

I know that the latest recommendation with the hist stats integrated into sensor history is to just use the default of 10 days and not bother but call me a sucker for punishment :).

I have a somewhat sophisticated set of per-entity deletion automation on different schedules (super high-frequency 3-days, 10days 30days 90 days and 180days). The default recorder is set to 365 days.

So while my states table is doing ok and is impacted by my purge schedule, I noticed the overall HA DB is still quite large and it seems to be due to two factors:

(1) statistics_short_term, ix_statistics_short_term_statistic_id_start_ts,`ix_statistics_short_term_start_ts1 do NOT seem to be impacted by purging at all. That seems super odd to me and a potential bug…?

(2) events table is also (more predictably) unaffected by purging.

So my two questions are: is (1) some kind of bug or expected? And is there any service that allows you to prune (2)?

Thanks.

After forcing all purges to happen and repacking.

tom_l · June 29, 2024, 11:24pm

Statistics are not purged.

Short term statistics (5 minute data) are down-sampled to hourly LTS (long term statistics) after your purge_keep_days interval setting.

So after down-sampling that’s 24 x 3 (max, min, avg) LTS data points per entity per day. Not a lot.

.

There is no need for your convoluted purge settings. Leave it at 10 days. The graph above is for this many entities (over 700 sensors):

alarm_control_panel: 1
automation: 303
binary_sensor: 233
button: 163
calendar: 3
camera: 13
climate: 3
conversation: 1
device_tracker: 2
event: 1
fan: 3
group: 1
image: 5
input_boolean: 21
input_datetime: 11
input_number: 34
input_select: 19
light: 58
lock: 11
media_player: 15
number: 100
person: 1
remote: 3
schedule: 7
script: 251
select: 89
sensor: 743  #### <----
stt: 1
sun: 1
switch: 84
tag: 7
todo: 4
tts: 1
update: 93
vacuum: 2
weather: 3
zone: 6

ncd · June 30, 2024, 1:54pm

Thanks Tom. ~~So why doesn’t the short term statistics data change after a purge? As you can see from my screen shots it didn’t budge at all (while the states table was reduced).~~

I’m not 100% convinced I don’t need any customization at all. There are two reason to have at least some form of it (though I admit I’ve gone overboard).

(1) long term statistics do NOT keep track of attributes and also only work for “measurement”. If I want to keep long-term track of binary sensors (eg how many things something happened) or attributes, I’m out of luck
This is a reason to keep the purge policy LONGER than 10 days.

(2) certain sensors are INCREDIBLY chatty and fixing them can be accomplished either by adding them as exclusions and then duplicating them as a “throttled” sensor (there is a platform for that) or pleading with the integration owner to change.
This is a reason to have MUCH shorter collecting intervals (if you want to avoid duplicating with throttling), e.g. 1 day. The reason is I do not want my DB to go over 1.5Gb or so just to keep backups/restores more reasonable (granted I’ve only had to restore once so far).

I can probably live with (2) and but (1) can be a real problem and currently HA doesn’t give us any tools to force a collection frequency at the sensor level like that. The only option is to set the global purge to a larger number of days and then have explicit settings for other settings to be purged earlier.

~~But back to my question, do you know why statistics_short_term wasn’t reduced?~~
EDIT
DOH, I just re-read your answer. You do explain it but I misunderstood the setting you were referring to. This is the global configuration.yaml setting. I see, so nothing will start getting purged until I hit 300 days (which is my global setting).
Thank you.

ncd · June 30, 2024, 1:59pm

Apologies @tom_l , another actual question this time :). What is the integration you are using to display your DB size over time?

tom_l · June 30, 2024, 8:51pm

System monitor for the sensor. InfluxDB for the long term data (2 years). Grafana for the graph.

You should not keep more than 30 days or so of state data in the Recorder database or you may suffer performance issues. For long term data use a time series database like InfluxDB or VictoriaMetrics.

ncd · June 30, 2024, 11:22pm

Do you need an external API key / cloud connection for graphana to plot/graph sensor data?

tom_l · June 30, 2024, 11:26pm

No just a time series database (i.e. not the recorder database). See https://youtu.be/m9qIqq104as?si=zNCcA3MpozFLs-rW