Handling long term cumulative historical data (finalizing days)

I have a number of sensors defined as below:

  - platform: history_stats
    name: AC Run Time Today
    entity_id: sensor.hvac_operating_state
    state: 'cooling'
    type: time
    start: '{{ now().replace(hour=0).replace(minute=0).replace(second=0) }}'
    end: '{{ now() }}'

These work great for showing today’s cumulative time spent in a particular state.

To use this for long term data though, you have to retain all the detail state data over whatever the period of interest, so if I wanted a year of data to flow to influxdb (or worse stay in HA) I might have dozens of daily data instead of one rolled up number.

Letting recorder record the history_stats derived sensor is worse, as it is calculated continually all day.

What I want is the final number only, essentially on day N, I want to run the calculation one more time for day N-1 and take that (and only that) to influxdb.

Are there any “normal” ways to do this in HA?

The alternative seems to be to schedule some job, either on the HA side to create a rollup entry by direct writes from SQL, or maybe a manual transfer into influxdb of the same sort. But I was hoping for a more “normal” way.

How does one handle long term rollup of data from HA? Not just long term storage, but rolling it up at period end?

Linwood

I have several sensors that report frequently (less than 10sec) and influxdb is still just 400MB after 4 months. Mariadb gets purged daily with a 1 week retention. So it stays around 300MB. At that rate there is no need to split up the database or otherwise worry about long term storage. My influxdb retention is set to infinity, and if need be I might reduce that 20years from now.

Not sure how influx does it, but it seems to retain all of the granular data so when you zoom in to view like 6hrs that occured 1year ago, you get expected results. It is also fast when you graph long term data (like view entire year). Not sure how (think it has to do with storing time series vs full db entries), but I like it.

I’ll look at it, but the somewhat more fundamental problem is that the sensors I want to track are cumulative for a day – e.g. run time today SO FAR. So historically only the last one of those readings is relevant.

Is there a better approach for doing that?

Or maybe the question I should ask is how, within influxdb, do I take an ordinary state sensor (say one that is idle, cooling, heating, fan-only) and ask “how much time, aggregated by day, was it cooling”? Can that be done directly in influxdb?

I am not really looking to save space per se; it’s running on a good size server in a closet. But having 100 data point all during the day, for months, of how much to that point in time of that day, is not terribly useful.

So here’s my current solution, which I’m hoping others may improve upon, but this seems to work. I define two different sensors, one for the current day only, and one for cumulative days:

  - platform: history_stats
    name: AC Run Time Today
    entity_id: sensor.hvac_operating_state
    state: 'cooling'
    type: time
    start: '{{ now().replace(hour=0).replace(minute=0).replace(second=0) }}'
    end: '{{ now() }}'
  - platform: history_stats
    name: AC Run Time Daily
    entity_id: sensor.hvac_operating_state
    state: 'cooling'
    type: time
    start: '{{ (as_timestamp(now().replace(hour=0).replace(minute=0).replace(second=0)) - 24*3600) | timestamp_custom("%Y-%m-%d %H:%M:%S%z", true)   }}'
    duration: 24:00:00

Then in recorder I exclude ac_run_time_today. If not excluded, it just puts in a new cumulative value every 30 seconds or so, when all I want is the current state to display on a web page.

The second sensor is calculated only for yesterday, but is saved, so over time it builds a daily history of the total by day. It gets only one entry most days, but if you restart home assistant it gets another (because the sensor goes to “unknown” then calculations, and it saves both of those).

This “daily” sensor can be used for historical plots of cumulative-by-day data. Unfortunately you cannot aggregate them directly for by-week or similar due to these potential duplicates, but if you wanted something like that you could create a third and fourth for weekly or monthly in a similar fashion.

What I do not like about the above (other than this duplication on restart) is the use in timestamp_custom of a hard coded date format. I’m not sure if that format will work for comparison all the time, everywhere. If you use just local or just utc though, you get an error in the log about not having a time zone for comparison.

And yes, this won’t quite work on the day you change to/from DST due to the 24 hour on a 23 or 25 hour day. :sleeping:

I feel like there’s got to be a better way, but haven’t found it.

For those looking at similar issues here are a couple of pointers that did not solve it for me, but provided more context and possible paths and also lead to other topics:

https://community.home-assistant.io/t/historical-heating-data-daily-weekly-monthly/79096/3

https://community.home-assistant.io/t/statistics-sensor-question-stats-per-day/128728/6

https://community.home-assistant.io/t/history-statistics-component/10194/5

I’d welcome a better approach, but hopefully mine above my help someone doing similar.

OK gotcha, you can create binary templates for each mode in HA, then use the integral() and cumulative_sum() functions in grafana to do that. Here is an example from my instance where I calculate energy usage over a time period using only power state:

The math() part converts from watt-seconds to kw-hr (time base is in seconds, so integral results in units of y*seconds). In your case, since you don’t have binary entities, you can create off, idle, heat, and cool templates in HA that converts the string state from thermostat mode to an on/off boolean. This would create 4 entities (one for each mode), which will toggle between on/off. Since grafana will return sec units for your binary entities, you would use math(/3600) to convert from seconds to hours. Then those can be used in grafana to graph hours of active time for any given time period.

There are some advantages to doing it in grafana this way… the biggest one being the time range selector. The image above is showing the last 6hrs… that can easily be changed to yesterday, same day last year, the entire past year, the past week, the week before that, or any random time period I choose… you get the point. Of course, you can use a dial guage instead if you just want the value at the end of the period; the graph is nicer for energy useage of course. Here’s an example of a dial guage I made to count the number of times doors were cycled opened/closed during a time period (note I’m not integrating in this last example, but I could if I wanted to find out how long the door was open throughout the day).

@truglodite will give it a try, thanks.

    start: '{{ (as_timestamp(now().replace(hour=0).replace(minute=0).replace(second=0)) - 24*3600) | timestamp_custom("%Y-%m-%d %H:%M:%S%z", true)   }}'
    duration: 24:00:00

So this sort of works, but as I went to use it in Grafana from InfluxDB I realized the data point has a time stamp of just after midnight, which is of course, when aggregated, the next day. I don’t think it is possible to specify the time stamp used for a sensor so this doesn’t look fixable (though it could be fixed downstream of course).

Hi !
What about an automation running every day at 23:59 which sets an input_number to the value of your history_stats sensor ?

That way, you can exclude your history_stats sensor from the history, and have an input number which is updated only once a day ?

1 Like

I’m using the LTSS Component for long time data storage.
It uses the Timescale database, which is a Postgres database clustered by time slices for faster queries (The chunk size used by the HA component is 1 month).

Since it is basically a normal Postgres database, background jobs can also be used to cluster the data regularly on the database side. For example, I always have the average temperature of my temperature sensors regularly summarized on hourly slices, because I don’t need a more granular view. Here are some examples for continious aggregations:

https://docs.timescale.com/latest/tutorials/continuous-aggs-tutorial

Also, since it is a normal database, you can then display the data again via the SQL sensor in the HA frontend if required: https://www.home-assistant.io/integrations/sql/

@bokub, thanks, I had not considered that, it should work.

@CM000n I ended up using InfluxDB which is a guess similar (though I may look back at this as I like PostgresSQL and use it a lot in other things). InfluxDB though is directly driven by IoTaWatt which I also recently got for power monitoring, so I may just stay there for now. But thanks for this pointer, I had not seen it before.

Incidentally, in InfluxDB rather than continuous queries (which have some real limitations) I used Node Red to automate aggregation, which works nicely, and since all these live in the Hass.io world they are easy to set up and upgrade.

Not really, since Influx DB is a Time Series DBMS and Poststregs is still a Relational DBMS :wink:
So it just supports some queries and logics I’m used to at work, where I’m mainly working with Relational DBMS.
But Influx is certainly also a good and valid possibility. I just wanted to give you the hint. Greetings :slight_smile: