My recorder database has quickly been growing to huge size, like 1 GB/day or so. I have finally taken the the bull by the horn and tried to understand what was happening.
First of all, let me share a snippet: This SQL query shows the top 100 entities that generates most state data in the recorder database:
WITH total_rows AS (
SELECT COUNT(*) AS total FROM states
)
SELECT sm.entity_id,
s.metadata_id,
COUNT(*) AS count,
COUNT(*) * 100.0 / total_rows.total AS percentage
FROM states s
JOIN states_meta sm ON s.metadata_id = sm.metadata_id
JOIN total_rows ON 1=1
GROUP BY s.metadata_id, sm.entity_id
ORDER BY count DESC
LIMIT 100;
That helped me to figure out which devices that were generating unproportionally much data. Some of them I could just block out with excludes.
However, other is data that I want to store, so I just don’t want to exclude it. For instance, I have a dozen or so Danfoss thermostats. I really want to save the data from those, regarding temperature etc, but the problem is that they report in like each 10-15 seconds, and the recorder stores this as a data point even if the value has not changed.
So I get like tons of rows in the database for these entities, where the state
is the same, the last_changed_ts
is the same, but the last_updated_ts
and last_reported_ts
are new for each line.
I am looking for a solution that can allow me to keep data for when these entities actually change, but not store a row each time they report in with unchanged data. (I am actually surprised that this isn’t the default behavior.)
Right now I feel I have to chose between the two evils of excluding all the data, even though I want to save it, or letting my database grow to huge sizes, just because the device is reporting it’s (unchanged) values every 10 seconds.
Is there some middle ground?