Long Term History and Storage w/ Aggregated Statistics

I’m trying to use long term home assistant history to answer questions like these:

  • How long was my heater on in the February 2022?
  • What was the average outdoor temperature in February 2022?
  • How did the indoor humidity change through the seasons in 2022?

These are just examples. I don’t have long term storage set up yet, so these aren’t going to be answerable for the past, but I want to ask them about data I’m collecting now.

Questions

Influxdb Question(s)

I have influxdb, with an autogen database that stores everything for 31d, like a lot of tutorials recommend.

I added retention policies for yearly and forever. I would like to store hourly data for a year and daily data forever (until the heat death of the universe, or I have to format my HA instance).

  1. It looks like I can’t do that very automatically. I found out how to add continuous queries for each unit type. I’ve been typing in things like this:
CREATE CONTINUOUS QUERY "cg_24h_hPa" ON "homeassistant" BEGIN SELECT mean(value) as value into "forever"."hPa" from "autogen"."hPa" group by time(24h), entity_id FILL(previous) END
CREATE CONTINUOUS QUERY "cg_1h_hPa" ON "homeassistant" BEGIN SELECT mean(value) as value into "year"."hPa" from "autogen"."hPa" group by time(1h), entity_id FILL(previous) END

It’s painful. Is there a better way to make everything be averaged and recorded like that?

  1. I got to the state database in influxdb. There are 450+ entities. There are boolean things, like heater on/off. There are strings, like weather data. There are integers and chars that don’t have units. Averaging these sometimes doesn’t make any sense. How can I store these things and have usable results long term?

Database Questions

  1. Is influxdb the right choice?

  2. Should I just give up and have the default retention policy collect everything forever? I don’t want to make it read GBs every time it makes an annual graph, so I think the separate retention policies make sense.

Statistics

In the hourly or daily statistics for things like booleans, it would be more useful to have the % of time it was on, and the duration of it being on, etc. For example, knowing that the heater was on for 20mins from 3-4pm is useful. Knowing it was on for 33% of the time in that hour is useful. Knowing that the heater was on for 18% of the day on Feb 21st and 25% on Feb 25th is very useful.

  1. How do I get that kind of aggregated statistics into the history/database? Should I compute those things in homeassistant? Is there a way to do that with queries in influxdb? Is this a case for templates?

Old Entities

I’m also seeing a lot of old entities in the db. Stuff that is no longer used in home assistant. Devices that have been renamed, or taken offline, or whatever.

  1. Do you regularly reformat your HA instance to remove this kind of thing? I have a lot of work into configuring HA, but I would probably not mind moving everything from one instance to another, just to get rid of old stuff I don’t need anymore. Is there a better way?

Please answer anything you can. Sorry for the logjam of questions. I have been working on this, loading up on stuff, and I’m ready to ask for help, so there is a backlog of stuff.

1 Like

I tried a query like this for the binary states. The average was as expected for most of the sensors, 50% over the time period (because it averaged the 100% and the 0% numbers, not the amount of time in each state):

SELECT mean("value") FROM "homeassistant"."autogen"."state" WHERE "domain"='binary_sensor' GROUP BY time(24h), entity_id FILL(previous)

I’m going to go through some of the important ones (like my heater being on) and set a home assistant statistic for them to get average_step and then collect that through the way I’ve already figured out.