MariaDB database growth is > than SQLite

I “upgraded” to MariaDB awhile back and is it just my imagination or is my database growing faster than it did with SQLite? I upgraded with the expectation that I would be able to keep more days of history but it seems I can actually keep LESS than before. My database is growing by nearly 2 million records per day. After just two days since I wiped the database (that’s a different subject) I have 3.8 million records.

I didn’t pay any attention to how many records I had before I switched to MariaDB, I just knew that performance had gotten bad.

I have multiple questions about MariaDB but I’ll create separate posts.

Yeah, since major things were done improving built-in db using mariadb is lately considered more as downgrade than upgrade. A year or more ago it was a good choice, but not anymore, i guess.

There’s a guide somewhere in this forum how to revert back to built-in db.

Reverting to SQLite seems like just a matter of removing the line from the configuration.yaml file. The old database is still there. I assume the old data would get purged. I’d like to understand better WHY MariaDB is no longer an upgrade. What happened?

MariaDB is better with a large DB, because it can be accessed at multiple points.
SQLlite is just a file with its limitations.

Whatever you DB is you should make sure it is configured to store the correct data
Some sensors produce data that can make the database grow fast.

I had made a Node Red sensor for my network switches and I had stored all the calculated data in the attributes and then just used the state to store a timestamp, so I could check if it updated regularly.
Issue here is that a timestamp will never be same, so it just stored my entire data from. 5 switches with a 5 second interval each time.

How do I configure it to store some data and not other data? My database definitely qualifies as large. Around 2 million records a day. Things like grid voltage and battery power change every 5 seconds.

The official documentation.

And a guide that should still be usable.

No, it’s not that simple. Ok, old (mariadb) database indeed remains, but HA starts new blank db and all history is lost. If you want to keep it you must import it into sqlite. Ok, you can always browse in old db manually.
I agree, mariadb is better regarding (multiple) access, if you want to “mess with db” manually. If you manage connect to sqlite (between two Ha writes i guess) then ha immediately starts a new, blank db, as current one is not accesible to HA anymore (been there, done that).

How far back in time you moved to MariaDB? If it is just recent move, than this is normal, as in old database you have already stable situation with new records being added, while the old one were purged and only statistics records were added as new. In new one, you might not reach that point and only new records are added to DB without purging.
Regarding advantages of keeping recorder DB separate from HA (I run HA as VM on ESX server and MariaDB on Synology NAS); whole system backup does not include DB, so it is faster to backup/restore and in case of catastrophic failure and need to restore I do not loose history, that is kept (and backed) separately.

I had this happen in the past a couple of times without me trying to access it another way, probably due to corruption. It is the main reason I switched to MariaDB and have not switched back. Is it more reliable now? I’d rather miss a couple of days of data while fixing errors then lose all LTS and not know until it is too late.

Exactly the reason i went to MariaDB back then - sqlite tends to crash out of the sudden. Then i’ve read that a couple of times HA made some big improvements in sqlite sb, so i took the bite and went back. All is ok so far for quite some months now (i’d say near one year or so). As a backup i installed innfluxdb then, and it’s still running on my synology, but i don’t use it, since, as said, original db works.

I’m going to stay with MariaDB until something forces me to abandon it. My system adds 2 million records a day and that will grow by quite a bit if I replace the bluetooth BMS integration which updates only every 30 seconds to Modbus with updates every 5 seconds.

In meantime I did some check on my MariaDB and it seems to me, that it might be quite normal, what you observe with 2M records being written daily. I have 3.5k entities in my config and I configured recorder to by default exclude all of them and include only carefully selected ones (~500), that I want to keep statistics or draw some history charts in dashboard. I keep 1 month of history. As I checked db statistics, I see ~1.1M records being written to DB daily and total size of of my DB looks pretty stable between 9.5 GB just after purge to 10.9 GB max, just before cleanup.

You have around 30 million records in the database?

I put a list of a few dozen sensors in an automation to purge back to 1 day. I’m not outright excluding anything right now. I tried to do that and it didn’t seem to work to slow down growth of my database, so I disabled some sensors and I’ll purge some to a day or however many days I think I want to keep. I might have to go through and identify sensors that I just don’t need history for at all, but ugh, that’s going to be tedious.

It is actually ~20M of records in my DB. Do not forget that also states of entities that do not contain history are updated and record is overwritten at every update…

I’d be seriously looking at extensive use of the Recorder “exclude” (or better yet, “include”) functionality. I’d be willing to bet the vast majority of those records will never be looked at.

You’re right that the vast majority will never be looked at. The trick is figuring out which ones. Until a few weeks ago I’d have thought that grid voltage was something I had no interest in. Now I have an automation that runs in the evening to import power when the rate of consumption exceeds may battery capacity, which monitors grid voltage so as not to pull it low causing our lights to dim, which my wife absolutely hates and blames me personally for. Cell voltages are not something I normally look at, but maybe I have some issue with my batteries and I need to look at cell voltages. It’s hard to know what I am definitely not going to want to have available to understand something that happened.

My strategy is more focused on purging back to 1 day history I don’t think I need beyond one day. Having NO history for a sensor could be something I regret.

Exactly. It’s hard to know what to exclude or include, and having a one-size-fits-all keep_days setting doesn’t help. As you point out, some data you only want to keep for a day, others for a week, a month, a year or forever.

This is why I submitted a Feature Request to set retention periods by entity.

Feel free to up-vote if you agree.

The other problem is that nobody tells new users that managing their Recorder excludes is going to be an issue. When a new entity is created, we’re prompted to select an area. Wouldn’t it be great if we were also prompted to choose whether (or better yet, for how long) to retain it in Recorder?

I’m finding that very few entities need detailed state change records for more than a few days. And HA already keeps long-term statistics for most of those. For the few entities about which I need “forever” data, I use automations which store the data I want to keep in text files. With this strategy, I’ve been able to exclude most entities and set my keep_days to three.