How to keep your recorder database size under control

Thanks very much for this great writeup. I’ve been using HA for a while now and have noticed the DB grows in to the order of several Gb but have never known why. It slows down my PI 4 to a crawl and I have things like the lights come on after I’ve entered and left the room :slight_smile:
Now I can fix it, so thanks very much

Thanks for the good guide, I followed that except digging through the database with SQL.
But I still have problems to understand a few things.

This is my recorder.yaml, followed by an exclude part:

purge_keep_days: 3
commit_interval: 10
exclude:
.......

According to the docs, by default there is an auto-purge each night.
When I look at the history of the database, it is growing only each night by a couple of MBs, no change at all during the day:
grafik

This does not make sense to me, as there is hardly anything happening each night between 2 and 4 AM and also not much in the log. On the other side, during the day of course a lot is happening, which is also saved in the history.
Does the auto-purge only delete content, while keeping the same data base size, so that during the day just the empty part is being filled up, and just in the night shortly before the next purge by coincidence it is slightly growing?
The only way I have ever seen the database shrinking so far was by manually executing the purge service with the repack option, but nowhere did I read that generally only with this the database would shrink.

1 Like

I’ve been seeing this growth every night in the wee hours, too. This is a new behavior in the past release or two.

I tried running a manual purge and re-pack, but that doesn’t change the size or stop it from growing a bit each night. It hasn’t been unmanageable yet, but it’s got my attention.

If that’s the case, sounds like what’s described in the reply #11 from this thread (for the context, also read the #10 and #12). I admit I haven’t paid attention to the size of my database, so what @CaptTom is describing is likely correct as well.

1 Like

Thanks for pointing that out, makes sense and seems to explain the observation that obviously several people have made.

I also do not care about a few MBs, just want to understand how much of an effect it really has to put effort in excluding devices, and the behaviour is confusing without this explanation here in the recorder documentation.

It started with having a big database, and shortening the purge_keep_days plus excluding more and more devices seemed to have no effect at all, so I thought this would not work. Only a repack shrunk the database, but according to the description I rather thought that this compressed it instead of removing the meanwhile unused table rows.

I’ve done a little more poking around.

The purge with repack enabled does indeed reduce the database size. I think previously I was checking the box but not enabling the slider switch next to it. Reading the screen is always a good idea.

BUT… It doesn’t prevent the database from continually growing. At least one component of this growth is the statistics table. Depending on the number of entities you have, this can grow by hundreds of records daily. It creates one new record each hour for EACH entity for which statistics are saved. There is also a statistics_short_term table which saves a lot of data but I can’t tell yet if it’s ever purged, and if so, what the “short term” might be.

Here’s a graph showing one series of tests. I repacked the database and it did shrink. It then grew again for a while until it stabilized. I did another repack and it shrunk and started growing again. Shortly after, I shut down HA, copied the database off and deleted all entries in the statistics table older than 4 days. I copied the database back and re-started HA. The drop in size was significant. I’d deleted almost 70,000 records, dated as far back as July.

I have yet to find way to disable keeping statistics, or to exclude entities which don’t need statistics kept forever. I think you can change the class of the entities so they’re not recognized as something to keep statistics on, but that might have other negative effects elsewhere.

I should point out that I don’t happen to use statistics for anything. If you do, don’t go deleting them. For one thing, the energy monitoring functions use them.

3 Likes

By statistics you mean… This? Statistics - Home Assistant

So, if I understood correctly, the size of the statistics table(s) is independent of the other recorder tables. And this only affects installations where people explicitly added the statistics integration (about 2.6% of the users). Am I right? Do you have that integration on yours? And why do you have it, since you said you don’t use it for anything?

(Sidenote: since I had written this guide, I haven’t had much time to investigate or tweak my Home Assistant installation. I don’t even know if I have those tables on mine.)

I’m just a dumb user, so I can only say what I’ve experienced, not how the system is intended to work.

Yes, the statistics tables (four of them) appear to be independent of the other tables which have been in the Recorder database all along.

I don’t recall ever adding the Statistics integration. I’m curious about your 2.6% figure. I’d most certainly like to join the 97.4%. I searched my configuration.yaml, integrations page and add-ons page. No sign of anything called Statistics. I even searched to see if I could add an integration or add-on by that name. If anyone knows how to un-install it, I’d appreciate hearing it!

For the record, I don’t advocate getting rid of Statistics altogether. If I started with HA today, I’d probably use it a lot. And I may in the future. Just that right now, the way it’s currently implemented, it’s not the right tool for me.

Screenshot from the official page: Statistics - Home Assistant

image

Hey guys, new HA user here and already configured my recorder to MariaDB, like:

# recorder configuration
recorder:
  db_url: mysql://USER:PW@core-mariadb/homeassistant?charset=utf8mb4
  commit_interval: 60

So my question is: Do i have to configure every part of the recorder, like “purge_keep_days” or when not configured it’s using it’s default value.

Sorry for the noob question, but just want to be sure to not get into DB issues from the beginning :slightly_smiling_face:

thx

So, any idea how I can un-install it?

I think this is the statistics sensor integration. So only 2.6% of people have added a sensor that reports statistics.

It’s separate from the long-term statistics that are generated and logged by HA. Also, I haven’t found any posts on (intentionally) not logging long-term stats.

1 Like

Thanks, that makes sense. I’ll just add to my routine an occasional manual purge of the statistics tables. Hopefully folks who are experiencing the inevitable database bloat will eventually land on this thread.

No, you don’t need to explicitly add all configuration options. Only the ones marked as REQUIRED in the documentation are actually, well, required. All others will assume some default value when missing.

Also, you just shared your MySQL credentials (including the password) to everyone in the world. Congrats on choosing a random password, but it’s wise to change it now. And also consider using secrets.yaml to store the password, like described in: Storing secrets - Home Assistant

1 Like

Does anyone know any end-user documentation about those long-term statistics? I’m having trouble finding in the official docs. Even if it is a blog post it would already be helpful. So far I only found these:

1 Like

Thanks for the info.
And thanks for the hint regarding the PW!
Man i am dumb, that’s the result when you have no time and do all in a hurry :rofl:
Will change the pw.

same boat here:

  • found out statistics is the db size hog
  • found out it all about device class, i am surprised there is no global disable-statistics-switch
  • too lazy to change device classes
  • if it gets too large i do
    sqlite3 home-assistant_v2.db
    delete from statistics_short_term;
    delete from statistics;
    next day the sqlite-db is small again, as the nightly job rebuilds the db file

and for the record:
i just use the docker image and was never asked if i want statistics

2 Likes

@gon, you’re exactly where I’m at. Just manually deleting the statistics data occasionally is the easiest option right now.

I have yet to find a way to put an SQL delete statement into an automation. I looked at a couple of SQL-related add-ons, but nothing jumped out at me. Maybe someone’s figured it out and is willing to share.

I’ve noticed an odd pattern lately. Basically, the database starts growing sharply at around midnight, and stops at around 4AM:

I understand that the database is purged at around 4AM, and isn’t repacked. So that’s why it stops growing then. Obviously it can start growing again when new entries fill up the space freed by the purge.

But why does it only start growing again, and so sharply, at almost exactly midnight? Is it a coincidence that this is the time the free space is used up? Is something else happening at midnight? And if so, why does it continue for four hours at the same rate? If it’s just calculating daily history_stats or whatever, shouldn’t that be a spike, not a sloped line?

At this point it’s just curiosity. I can live with this, but I’d feel better if I understood it.

since the documentation somewhere (or someone) says only sql-select is possible:
i suspect the only sensible solution is to use a cronjob in the underlying linux to execute the delete’s.

Since there is the SQLite Web add-on which has access to the database, someone with enough knowledge and time could create an add-on to run SQL queries periodically on the HA database.

Alternatively, someone could try using the SQL integration to build a pseudo-sensor that runs the DELETE query (instead of the usual SELECT query for a sensor). The scan_interval configuration is relevant too. I have no idea if this idea would work, but worth trying.

Even if the DELETE pseudo-sensor doesn’t work, it’s possible to create sensors for the size of each table in the recorder database.