I’ve been using Home Assistant for some months now. And I’m looking into good ways to store sensor data long-term. I think it would be interesting to be able to track how temperature varied from month to month and year to year, see how it correlates to power consumption, indoor temperature etc. I’m currently using a MariaDB database for storing historical data with the recorder, and I’m using the default ‘purge_keep_days’ setting of 10 days. But I’m reading that the database will increase immensely really fast if I just increase the configured setting (e.g. if I want to keep data indefinitely). So it seems that the recorder is more for storing historical data short-term than long-term.
I’m therefore curious how people store and use their historical data long-term, and if there’s an easy way to do this. I’m thinking that it has the possibility of defining which entities that I want to store, possibly with an automatic pruning of data interval.
Exactly, i use MariaDB for 7 days and Influx for 1 year to infinite. Create a retention policy for Influx and also continuous query. Retention policy is for how long you will keep the data (like 1 year or infinite), and query is for how frequent you will get data from the sensors into InfluxDB. It is not advisable to keep continuous query so frequent like 1 min for temperature; you can go for hourly query for temperatures. Otherwise the Influx DB will get really large…
I’ll be looking into InfluxDB and Grafana one of these days. Meanwhile…
I just save the values I want to keep to a flat file, which I can open in any spreadsheet or database. For example, the log file of how long my heating system’s oil burner runs each day is defined here:
After my HA database grew to be untenable, I did some soul-searching. What data do I really want to keep?
My first conclusion was that HA is extremely weak in this area. There’s no way to individually set retention periods for each event and state change. I want to see minute-by-minute what the outdoor temperature is. And I may even want to see the rate of change over the course of a day. Beyond that I don’t want to keep it. But there are other HA data which I may want to keep for weeks, even years.
So my solution was to ruthlessly exclude everything I don’t need from recorder, set the purge days to 7, and off-load what I really want to keep elsewhere (in my case, a few flat files. But an external database would be even better.)
For long term storage i can recommend the homeassistant addon victoriametrics.
It works perfect with grafana and homeassistant and is setup in a few minutes.
Long term sensor data is stored using the least possible amount of diskspace thanx to victoriametrics time series database which outperforms influxdb in any way. So you can store all your data forever and you don’t have to fight with retention policies - just use it and have fun with your data.
I certainly like the initiative, how does this add-on handle backups? Ideally, db should be frozen, while the backup is being created to prevent any corruption. Also is it available for amd64?
Hi, since yesterday amd64 is also supported in version 1.8.13 of the victoria-metrics addon - thanx to @oetken ! Backups can be made by just copying the victoria-metrics-data folder without data corruption. There is also a way to make snapshots of data in a separate folder. More info here: https://docs.victoriametrics.com/vmbackup.html
I have two years of data in MariaDB, it’s running on a NAS with a ton of disk, I give it a 1GB of RAM, but most times it only uses 256MB. DB is 17.4 GB, with 32 million rows in States and 23 million rows in events.
I see the value of separating long term and short term but haven’t found a reason.
The split is normally to have better DBs for big queries to go through and or loading times in Ha.
For example, when you open the HA dashboard does it take longer than 0.5s to load the entities data? if so, that’s because of the big DB, or when showing a specific entity historical data HA needing to recover a lot of history will also drag through.
No, it’s actually very fast, mostly due to the 1GB of RAM that I gave to MariaDB and the fast RAID array.
What would be nice is having three levels of storage. In Memory (last 12 hours, aka a quick trend), MariaDB for last week, Influx for older data AND the ability of the history chart to seamlessly query the right source. Does it do that today when some is in Maria and some is in Influx? Or do you have to manually configure different queries?
I believe there’s currently no DB tier level implemented, although i believe some sort of in-memory is done as multiple ‘last state’ values are reseted under restart, so that seems to signal for fast processing ongoing events are held in-memory too.
However, I would also be interested on such a feature. i guess something like that is what has been attempted by the switch to the ‘statistical’ sensors.
Also, i do know that trying to make a seemles setup of such architecture is a technical challenge, even for big companies (where normally there’s a single DB method, or mulitple reports for each DB).
Hi,
Can you please explain for a beginner what would be the difference adding VictoriaMetric in the picture? From VictoriaMerics documentation in understand I will still need both Influx DB and Grafana. But Influx + Grafana already store the metrics and display them. What is the purpose of VictoriaMetric?
Hi,
I’m looking for a way to store recoder data (or at least history data) on an external storage like dropbox without using any further database.
This means I just want to save data from selected entities just bevor it gets deleted from HA on this storage using the HA Rest Api or a dump of the default database of HA, copy that stuff to the storage and merge it with existing data, possibly using some scripts. The last step is not necessarily to be done from HA…
Would be nice if someone has figured out a method for doing an approach like that, and could give me some hints, or as a worst case explains why this is not possible or too complicated.