DB cleanup

Sensors that store historical data in the statistics table completely unnecessarily take up a lot of space.
The following data is useless and yet it is stored in the database.



It would be great if new rows were not added when identical ones are already saved
It would also be useful to add to /developer-tools/statistics the ability to clean as there is already the ability to manually edit individual entries.

A list of useless entries from the last 3 months:

SELECT
    m.statistic_id
  , c.id
  , DATETIME(p.start) AS start
  , DATETIME(n.start) AS end
  , p.mean, p.min, p.max, p.last_reset, p.state, p.sum
FROM statistics_meta AS m
INNER JOIN statistics AS c ON c.metadata_id = m.id
INNER JOIN statistics AS p ON p.id = (SELECT MAX(t.id) FROM statistics AS t WHERE t.metadata_id = m.id AND t.id < c.id)
INNER JOIN statistics AS n ON n.id = (SELECT MIN(t.id) FROM statistics AS t WHERE t.metadata_id = m.id AND t.id > c.id)
WHERE IFNULL(p.mean, 0) = IFNULL(n.mean, 0)
  AND IFNULL(p.min, 0) = IFNULL(n.min, 0)
  AND IFNULL(p.max, 0) = IFNULL(n.max, 0)
  AND IFNULL(p.last_reset, 0) = IFNULL(n.last_reset, 0)
  AND IFNULL(p.state, 0) = IFNULL(n.state, 0)
  AND IFNULL(p.sum, 0) = IFNULL(n.sum, 0)
  AND date(c.start) < DATE('now', 'start of day', '-3 months')

That is what a time-series DB is supposed to handle well. And IOT sensor data is one of the primary use cases normally suggested for time-series DB. --So if repetitive redundant data is an issue for you, it sounds like changing the underlying HA DB to Influx could help a lot, according to other forum postings here.

I tried to use InfluxDB unfortunately on Rpi3 it was taking so much resources that it was not even possible to switch between tabs :confused: