History consolidation for older statistics?

Before I go and create a feature request, I’d like to hear if anyone has done anything like this or has ideas — or if there’s already an off-the-shelf solution I’ve missed.

There is an old-school systems administration/monitoring tool called RRDtool. It’s designed to collect and graph time-series data. It has some interesting, unique properties — the most interesting being that that database is a fixed size and does not grow over time.How does it do that? Discarding information, of course.

However, the neat thing is that it can do more than just throw out older entries once newer ones come in past the limit. It can also consolidate. And you can set it up so, for example, you’d have:

  1. Average every minute for the past week (10,080 entries)
  2. Average every five minutes for the past month (8,640 entries)
  3. Average every hour for the past year (8,760 entries)
  4. Average every day for ten years or whatever…

Basically, rather than throwing away old data, you keep it with decreasing granularity.

Is there anything like this in the Home Assistant extended universe?

Google issues?

1 Like

Huh. I didn’t consider that someone would make this with literally RRDTool!

That’s definitely interesting and I see people have decent results, but it’s really complicated to configure.

I think it’d be nice to have something like that, but integrated more closely. Plus, it looks like it involves actually having an rrdtool binary inside your HA container? Not particularly ideal!

Not really sure what you had in mind.
RRDDtool is a pure local DB, isn’t it? There is no client/server architecture afaict, so very similar to the sqlite tools used by HA itself.

The only alternative would be support by HA itself. Although unlikely to happen imo, that would be a feature request.

It is a pure-local db, yes, but:

  • it requires the binary rrdtool, which is awkward in a number of ways (arch-specific, for one thing)
  • rrdtool is really meant for network metrics, and the complex configuration reflects bending it to do something different

Looking at this has led me to be aware of the InfluxDB integration, though. I’ll look into that!

I think more closely-integrated feature would be nice, though. I think I’d want it to be opt-in per-sensor rather than opt-out (maybe with some way to apply in bulk, though).

And, for binary sensors, there should be configuration for whether to store 1) percentage on per period 2) count of transitions to on or off, 3) “on” if ever on / “off if ever off”, and 4) prevalent state for the time.

You might notice the “exclude” and “include” options in the InfluxDB HA doc, both by domain, glob, or entities, which provide a quite high level of granularity.

That’s typically stuff you calculate based upon the raw data, and for which InfluxDB is definitely flexible at.

Basically, RRDtool is to InfluxDB what sqlite is to an Oracle database :wink:

1 Like