Methods of historical data reduction?

I had an idea that could either be neat or dumb and wanted to hear your input and/or ideas for implementation. I wasn’t sure where to put this, because it isn’t related to the sensors themselves or data-gathering itself. It is more related to processing historical data.

Basically, the idea would be that the further back you go in your data, the more averaged and fewer datapoints there are. So for instance you might update a sensor once per 10 seconds. Once data is (for example) an hour old, you average every 2 of these old values together and replace the 2 values with this 1 result. If it’s 2 hours old, you average every 2 values again. And then 4 hours, 8 hours, 16 hours, etc.

So the idea is that the further back you look, the fewer points there are. So you can retain extreme timespans for comparatively little data. In fact, since the sum of 1/2^n converges, that would essentially mean you can store infinite time using a fixed database size. The trick/catch being that the further back you go, the less resolution you get.

I don’t know if there’s a term for this. “Lossy Exponential Temporal Compression” maybe. And it might be dumb because you are essentially reprocessing your whole database periodically (though the further back you go, the less data there is to process, and the less often is processes it.)

Playing around with the rates and averaging of something like this might be interesting, to change how much resolution you retain over time. Or maybe even a fourier-series version that might retain frequency data based on the strength of trends and cycles. I had that particular idea a while back when thinking about predictive algorithms for HVAC.

Like I said, not related to sensors or home assistant specifically, but relevant in how historical data is stored.

Changed my search terms around when trying to research this. It looks like some services like Influx can do stuff like this.

Just wanted to say that InfluxDB is what you are looking for. I use InfluxDB for long-term data I want to keep and I keep only 7 days of history for the HA database.

Nice! I’m figuring it out myself. 7 days on HA and long-term on Influx is a good idea!