Offload history/stats to a secondary database

Hi.
I’ve seen people report issues with the standard home-assistant_v2.db getting too large which caused problems with performance and in some cases reliability.

The only solution I’ve seen so far is to use recorder to limit the number of days and/or remove some of the sensor tracking data in order to keep the database
size managed which also keeps backups a reasonable size.

I was thinking that, rather than removing sensors and limiting history, that this could instead be improved by setting it so that anything Day-1 (or -2) and older gets offloaded in an overnight job to a secondary database for longer term archiving.

This way we get the best of both worlds.
A lean performant system as well as archival stats which we could call upon if required.

If you’re using HA supervised you could use the InfluxDB addon to record all sensor info and exclude it from the main database.

You don’t have to use the default SQLite database, recorder also supports MariaDB, MySQL and PostgreSQL.

Also the performance of the database has been improved considerably lately.

I have often thought an RRD database would be the way to keep history. There is an addon or custom component for them. RRDtool - Wikipedia

Thanks all for the replies
@Pippyn + @robertklep This suggestion would ideally be part of core so it works across all releases and allows for long term data.

I don’t know the % split between people using core and those with Supervisor versions, but the suggestion was that the core release would manage it as standard so it’s available to all.

For example I didn’t know about some historical issues as well as recent improvements (as well as what recorder did) until after I’d spotted my backups getting much larger… and then subsequently researched the subject.
I only spotted the size increase due to constant tweaking as I slowly expand my setup and then, once I did identify the cause, I had to manually intervene to resolve by cutting down what I record and for how long.

@tom_l Yes, I found as apart of my research the info on the big improvements that came in recent releases that made the DB far more responsive as well as shrinking its overall size. I was thinking more of long term data storage where someone may also wish/need to monitor historical data over months/years.
I don’t know if the current setup would stand up to that, but it would make for biiig nightly backups :slight_smile:

The idea being to have the big historical database available to back up separately so it could be done on a separate schedule compared to the real-time database. Weekly or monthly or manually.

I’m still new to HA, but I’ve seen that there’s been a lot of recent improvements to make it more user friendly and robust which hopefully gives HA a wider appeal.
This suggestion is made in the same vein.

There is a pretty good discussion about this issue here.

Ah - thanks. I missed that when searching if it’d been requested before.
Makes for a good read too.

Voted +1 there

Just to clear…

using an external database (MariaDB or InfluxDB) isn’t in any way limited to be used only by a Supervised/OS install. Both of those can be used by any installation method.

You just have to set up your config to use them just as you would when running a Supervised version of HA as well.

I use both of those (but with very limited use of Influx simply because I haven’t taken the time to learn it) and I run a HA Container install.

It would be great if this was the standard method of data storage without having to intervene in any way.
As it was put in the other post

I get why the default location for the DB is on the host machine and inside HA itself. The “Target Audience™” has become less tech savvy and most of those users could care less about the DB and don’t necessarily have another location available for the data.

But I don’t get why the long term data isn’t provided the ability to be handled like the short term recorder db config.

At this point (as far as I know) you have zero control over what, where or for how long the long term data is stored.