Propper implemetation of long term sensor data

Hi Guys,

The problem:
I’m struggling with a (what i think) is a big short comming of home assistant.
I have a lot of sensors that all record data into the home assistant database. Temperature sensors, energy meters, water and gas meters. Now all the data points are stored in the home assistant database and it grows rapidly!

You can set the retention period in the recorder to keep the database at a reasonable size, but this will destroy your data.

When i’m very interested in keeping the data for these sensors for a long period of time to spot any trends. For example: The temperature of the refrigerator is slowly decreasing over 1 month of time, or an appliance now use 15% more power than 6 months ago.

The current solution:
Many of the suggestions on the form point to the influxdb addon. Shoot your data into that database and just use Grafana to do all the “magic”.

But i feel like this is the wrong approach.

First of all, now since we have Lovelace, our dashboards and graphs are getting smoother and nicer every day. Even still this is just starting to develop, all the stuff looks amazing so far. I want all the graphs and data to have one source.

Second, you can’t do anything with the old data. There is no good way for home assistant to access the data in an automation or something like that. For example, list the biggest energy consumer of the last month, or what is the average

Third, It takes a lot of effort to get it setup. I have around 20 temperature sensors and 15 smart plugs. This takes an significant amount of setting up. And most of the stuff needs to be hard coded, so any adjustment or change needs to be made in many places.

What should it be:

Modify the Recorder component
There are a couple of options. The first one is to modify the recorder component to have a separated short and longtem policy. On the Domoticz platform they have this build in:

You can set the short time storage of the sensor data. After this period, the software will automatic start to down sample. For example, temperature will be saved to a min, max and avg value each day. In this case all the data will be stored for years (or any other time to be defined)

Create a component for long time storage
A component can be created that does all the desired stuff. You can specify which sensors should be recorder, and what should be recorder.

  • Time intervals: hourly, daily, montly
  • Calculations: Min, Max, Avg, Difference,
  • How long to keep

May bee there are other options around.I’m more a scripter than a programmer, so i’m probably not capable enough to get this going on my own. So anyone having similar toughs on this to get this off the ground ?

There are some other people who where looking for something similar:

I agree with this. I already have Influx, Chronograf, Grafana, Kapacitor set up, but it’d be so much easier if HA could just downsample the data after a while. It doesn’t have to replace Influx, just the periodic downsampling would be enough for me to abandon Influx and I think that’d satisfy the majority of users.

I haven’t tried it yet but I believe you can query InfluxDB from HA, see the documentation (there area couple of examples):

I think the issue is that InfluxDB is a much better database type for long term storage of time dependent data. Where as the HA default database type SQLlite is better suited for the shorter term but type of queries that HA needs to make. If you just downsample, it might work but you lose a lot of data, whereas with InfluxDB you don’t have to eliminate anything and it maintains good performance. However I am no database expert…

I do that to obtain some averages, but it’s way too clunky to recommend.

Yes Influx is better for time series, but just by downsampling the HA SQL database could be trimmed a lot. HA could even uses a time series database for the historic logs it wanted, rather than using SQLite, but that’s way more work than just downsampling.

Yes, the influx db sensor is the best current work around. But it is just not a real integrated solution.

You will get better performance with, say, mysql as your database engine. I am not sure how long it is feasible to keep data in and still have HA usable, but a heck of a lot longer than the default sqlite.

I have the same requirement as the OP, except also to deal with very short term data.

Having had my SD card fill up due to database size, I limited the purge_keep_days and now it’s manageable, but like the OP, I lose long term data.

Possible solution is to define two levels of purge, first an ‘archive’, then a full purge. For the archive, it could be configurable whether to store the average/min/max for each value, and the period e.g. hour/day/month. Once a full period older than the ‘archive’ threshold is collected, it could be processed, then replaced with a single value in the database. I’m confident this could be done with some extra code in the recorder purge feature.

I am happy to look at implementing this if there is broad agreement.

Very short term data:
I think this is a slightly different requirement but I’ll state it here anyway because it could be solved in a similar way.

For certain sensors, the live value is very interesting - e.g. what is the current power consumption of a wall outlet. But some of mine are not configurable and measure to 0.1W precision. This means they are updating & recording a new value every few seconds from noise.

In these cases I am interested in the live data, and in some cases want to trigger an automation from it (so don’t want several minutes lag), but actually I’m only interested in recording one value every 10mins or so.

Current solution:
Create a new sensor that duplicates data from the ‘live’ sensor with a scan_interval of 600, then exclude the ‘live’ sensor from the recorder component.
Or round off the precision of the second sensor, so that it doesn’t change so much.
But both of these are messy workarounds - either way I have two sensors for every power meter, and it’s all manual config, time consuming.

So perhaps a different (but related) feature would be to be able to set a record_interval for a sensor, meaning any sensor could be configured to record just one value in the specified period. HA Core state would be the live value; UI and automations would update quickly but the database would fill up slowly.

I think most users would benefit from this as everyone has finite storage and a hard cutoff to delete all old data is probably losing data that people would like to keep.

I am also looking forward for it. Currently implemented transaction replication to Azure SQL DB, but it still would be great to have such an option in HA.

If Home Assistant used the database as a relational database instead of a JSON dump, long term storage of data wouldn’t be an issue. All the bloat and slowness currently with the database is because data is stored JSON formatted instead of as native data types. There’s basically no way to optimize storage and query for this data.

Hi all, bumping this thread to check if there are any plans for a proper, fully-integrated solution, as per OPs question?

I would say this is a fundamental shortcoming of HA, and it would be interesting to hear the project maintainers’ view on the matter.

By that I mean if there is any initiative to solve this? If not, can we get one going?

Either way, I’ll personally be more than happy to put development hours into this, if needed.

4 Likes

Home Assistant should have a statistic data for some sensors.

I have a energy board capturing data for 15 sensors every second, it is nice to see data flowing like that, but it is too much data to store and really unnecessary.

I had to change the settings on the board do send every 15 seconds because it was impossible to show history for any of those sensors.

The way I imagined this was to have some optional setting on the sensor that defined how long to keep detailed data and statistical data.