Long term statistics on a separate database

henrik_sozzi · November 4, 2021, 6:41pm

In my opinion Long-term statistics tables should be on a separate database than the main one.

If you have to delete DB for corruption or other problems you’ll loose all your long term statistics with months of precious data!

The database is written and rewritten very frequently so the corruption is not an infrequent event. (at least when using the default SQLite DB)
If the databases are two, however, the last term statistics DB would get written only once an hour when updating statistics, greatly reducing the corruption probability.

Furthermore, by dividing them, it will be possible to delete the normal DB leaving long term statistics intact.
It would be also possible to backup the long term statistics and restore it in case of disaster recovering with ease.

Obviously when using other DB than default (MariaDB, …) two connection should be configured. This way you can optimize DB parameters and virtually use even two separate servers.

I believe that splitting the database will be the best option we have for data integrity.

Jpsy · November 4, 2021, 7:27pm

Couldn’t agree more. I always take my snapshots/backups without the db files to reduce size. The downside is that I will lose all long-term statistics when I have to use a backup. Splitting the db in two would be the perfect solution.

tribut · November 4, 2021, 7:41pm

I’ve been using influxdb for long-term statistics for quite some time now and it works great. Maybe it is an option for you as well: InfluxDB - Home Assistant

Jpsy · November 4, 2021, 9:58pm

I am using Influx as well. But the new HA long-term statistics integrate so nicely into Lovelace that IMHO Influx cannot compete. After only 3 months I am actively using long-term stats everywhere while I hardly used Influx at all in over 2 years.

finity · November 4, 2021, 11:28pm

I added a FR for the same thing 4 days ago here:

I’ll vote for yours if you vote for mine.

JK, I already voted for yours too.

callifo · November 5, 2021, 2:53am

Absolutely. I already do this, I’ve been using mysql to create the long term stats and then store them in a new database. Then I can use them as sql sensors, or view the data in its whole through grafana.

nickrout · November 5, 2021, 6:06am

Surely it is easy to just copy two tables to another database once an hour? Then you always have something to restore from.

klacol · November 5, 2021, 7:03am

Does a separation in two databases really solves the root of the problems?

Jpsy · November 5, 2021, 12:36pm

Yes, I think so.

My “short-term” DB for an interval of 14 days contains all measurements for a gazillion of entities. It is over 5 Gig in size. I don’t care to back it up. If it’s gone it’s gone. A new one will be complete within another 14 days.

My long-terms are really precious. These are only a handful and AFAIK they only create a min/max/avg value triple each hour. So that db should be small and would always be included in my backups.

henrik_sozzi · November 5, 2021, 2:45pm

Yes, that’s on my todo list and I know that InfluxDB will accomplish the same result. But my proposal is for people that want to use Home Assistant as it now contains the long term statistics platform that’s not bad. But data integrity is at risk, at least. If people use the statistics expect that his data are safe.

Of course I’ll do it as soon as I post this reply

That’s interesting! How do you do that? Triggers that insert data in the other DB? Os is there other way to do it in mysql?

It’s a core feature. It just work out of the box. I’m able to manage tables, select, insert, and so on. But not all people is expert in database management. And their data integrity is a priority too.

No, it doesn’t resolve the root of the problems. But is a very good architecture. Even in business software the DB with data to be used for data mining, business intelligence is normally a stand alone database, not some tables inside other DB…

Exactly! You got the point!

klacol · November 5, 2021, 3:38pm

I agree, that it is a good pattern. It would minimize the risk of a database issue, since only errors in some part of the code (the part that maintains the min/max/avg value triple each hour) would destroy the long-terms database. But those parts can also have errors and would then affect the long-term database.

I think, the basic assumption here is, that most of the errors are part of the management of the short-term database; and therefore the risk would decrease.

I am a bit sceptical, why we have to clear the database so often. The database is normally the most stable part of the whole system and not vice versa.

henrik_sozzi · November 5, 2021, 4:54pm

The problem is that default DB is SQLite, that use a normal file on the filesystem. If you don’t change the recorder commit_interval parameter the DB is updated (=file write access) every second. If you consider that most of users (>57%) are using a Raspberry Pi to run Home Assistant and most of them use the SD to store data or use a SSD without activating TRIM you can understand that DB corruption is around the corner.
Furthermore it happens that some updates or restore of snapshots create problems and require a DB wipeout.

So, database splitting would be a much better strategy for every user. And so you can backup the long term but not the short term DB, keeping your backup (aka snapshot) much more compact.

Then, if you want a more stable DB you have to change the commit_interval to something longer (not too much) and use an external RDBMS like MariaDB or MySQL.

callifo · November 6, 2021, 12:17am

Yeah I’ve created events that run at a particular time, mostly at midnight, they parse for various things like min/max/avg of house temperature, mains min/max voltage, internet speedtest results, calculate the aircon running time for the day, and then I insert them into a new database. I just use Mysql Workbench to do it, easier than working from the CLI.

I then pull this back into HA with a sql sensor, and also view the long term data through Grafana.

klacol · November 7, 2021, 12:43pm

TRIM is new to me. It seems to be not activated per default in Home Assistant OS. I found this issue about the topic.

I am wondering, how to activate it manually.

klacol · November 7, 2021, 12:44pm

Ah, that issue was written by you @henrik_sozzi

henrik_sozzi · November 7, 2021, 1:09pm

ahah, no, the issue was not written by me but the last comment (energywave) is mine You can read my blog post with detailed instructions on how to activate TRIM that totally works. Only pay attention that not all controller/SSD supports it. You can use google translator to translate from italian

But better not going further OT with that here

AaronCake · November 7, 2021, 3:13pm

Voted yes.

And a separate long term database should support a completely different DSN so it can be run on a remote instance. And as well, store state and attribute data with the proper DB native data types instead of all as varchar with an enormous string of unparsable and space wasting JSON. Separate tables for states, entities, attributes, etc. all linked together via linking tables. As in, a proper and efficient relational database that offloads query work to the database engine, not program code.

And hell, maybe the same thing can be applied to the recorder database as well.

martusi · July 12, 2022, 3:13pm

I fully agree with the troubleshooting of the database, but this could also happen in other ways than the duplication, in short, it must be made safe in its integrity.A serious home automation system cannot afford to lose even one bit, especially of that database.

MasterCATZ · July 15, 2022, 3:15am

I really would like a way to separate data into separate databases
lots of long term data I do want , but also a lot of fluff data that I do not want getting backed up in the daily backups

if we could have multiple recorders setup that would be fantastic

but currently only influx and another type of database that can be done at same time , unless using nodered

pilot1981 · July 16, 2022, 11:59am

INFLUXDB require more resources, I think only a raspberry isn’t sufficient to have all system working fine!