How to Shrink the Database

KDM · December 2, 2019, 1:10pm

This seems to be a bit of a recurring theme. I’ve read multiple posts by may people, over several years with the same issue. My over-riding thought is: surely there as got to be a better answer than repeatedly deleting the database. These are my findings. I haven’t got 100% to the bottom of it, but I am a lot happier and I hope others find this useful.

I have HA 0.96.5. It’s running on an old, but very competent laptop. I rarely see CPU usage above about 5%, never seem memory usage over 50% and the hard drive is almost empty.

When the home-assistant_v2.db file reaches about 2Gb, updates to my tabs get very slow. It can take 10 seconds to refresh. I have only about 20 sensors, on 5 minute updates.

I can fix the problem by shutting down HA, deleting the DB and running it again. It runs sweet for a couple of weeks, then starts, again, to grind.

I would like to maintain 14 days history, so I have

recorder:
  purge_interval: 2
  purge_keep_days: 14
  exclude:
    domains:
      - automation
      - weblink
      - updater
      - light
      - script
     entities:
      - sun.sun # Don't record sun data

This morning, I ran recorder.purge with {“keep_days”:“14”, “repack”:“true”}. My understanding is that rather than trying to remove entries, repack creates a fresh file, then deletes the old database, replacing it with the new. Sure enough, I spot home-assistant_v2.db-wal ramping up in size until about 4Gb, then going back to zero.

So, I considered that 4Gb must simply be the size for 14 days worth of data for my sensors. That irked me, because it’s just too slow to refresh the display. So, I figured I have to bite the bullet and set

recorder:
  purge_interval: 1
  purge_keep_days: 7

and I re-tried recorder.purge with {“keep_days”:“7”, “repack”:“true”}.

Well, the results surprised me. The DB file is now 3.5Mb, and I’ve only lost 1 week’s worth of readings. Better than deleting the database!

So, conclusions?

I’m still a bit confused as to how my original purge yielded up a 4Gb data file with 2 weeks worth of data, when 2 weeks of data appears to only occupy 3.5Mb.
Is it possible that purge_keep_days in the configuration over-rides keep-days when executing the purge?
Is it possible that the initial purge to 14 days did actually work, but i wasn’t patient enough to wait for the file size to drop?

I’ll keep an eye on my file size and report back, but for now I’m putting the purge_keep_days back to 14.

Keep up the good work, folks.

KDM · December 2, 2019, 9:44pm

How truly bizarre. Having categorically stated that CPU has rarely gone above 5%, since shrinking the database, the CPU’s gone mad! With a brief 10% peak every 15 minutes all day! I don’t think it’s affected anything - it’s just a new behaviour. Of course, the server’s running Mint Linux, so it might not be a HA activity, but it’s still a bit suspicious!

SteveDinn · December 2, 2019, 10:10pm

This is almost certainly not HA, unless you’ve configured something to run or update every 15 minutes. I also run on Linux Mint, and I don’t see this.

Can you correlate the spikes with anything in your logbook or HA logs?

KDM · December 8, 2019, 12:22pm

No, and it stopped bloody doing it that evening! We’ll never know!

KDM · December 29, 2019, 10:46am

Out of curiosity, I did another experiment. I dunno why I didn’t think of it sooner. The lapbook I’m using as the broker / server has a hard drive light. When I refresh one of my temperature graphs, the hard drive light goes mad. Then it goes off a few seconds later when the graph appears.

So, without a shadow of a doubt, the slow update rate is due to DB accesses. the good news is that the DB has maxed out at 1.2Gb and is slow to access, but the point is that it isn’t getting slower.

CaptTom · December 29, 2019, 2:10pm

I went looking for an answer when my database reached over 3GB. I learned about purge interval and keep days, and I played around with deleting the database and repacking.

It turned out, for me, that HA had auto-discovered my router, and was happily recording every little detail it sent; bytes in, bytes out, every connection, etc. Disabling that integration (and deleting the database) got me back under 1G. Changing the keep_days to 10 got it to about 450MB. Then I deleted the integration for my Roku TV, which was also pretty active, did a repack and got it down to around 300MB. Obviously you can exclude things from the recorder if you don’t want to remove them from HA altogether.

I think the moral of the story is, look at the most active devices in your system and decide if they’re really something you need. I have other ways to monitor my network, and it’s usually pretty easy to see what’s on TV without bringing up HA.

A side benefit may be reduced “wear and tear” on the SD card in my Raspberry Pi. I’ve heard the horror stories of these things crashing and figure the less I thrash the database the better.

KDM · December 29, 2019, 2:26pm

That’s a good point. HA never auto-discovered my old router, but it found my new one. I do like a graph of internet I/O over a day or so. Any ideas how I can disable (or not log) part of the integration?

I went to an old lapbook as a server because I burned out SD cards at one a month on the RPi.

CaptTom · December 29, 2019, 3:13pm

Wouldn’t just adding it to the exclude list work? I really haven’t done much with that, just added one entity.

This thread got me thinking I could re-enable some of the integrations I disabled, but then just exclude them from the recorder. My problem is there are dozens of entities I’d have to exclude, but they’re in domains I don’t want to exclude, like “sensor,” so I’d have to list them all individually.

I wish the exclude list worked at other levels besides just entity and domain.

KDM · December 29, 2019, 3:32pm

It looks like stuff can be individually enabled / disabled here:
http://nnn.nnn.nnn.nnn:8123/config/entity_registry

CaptTom · December 29, 2019, 7:00pm

Right, but I think that’ll completely disable them, not just prevent their data from being recorded to the database. I’m thinking maybe someday I’d want to see the current information, even if I don’t want to record historical data.

KDM · February 28, 2020, 11:33pm

This is sending me to the fair again. My database is limited to 1 week capture. Since Christmas, the DB size hasn’t exceeded 2Gb. The web pages served within HA take a good minute to refresh. There’s got to be a faster way to recover the data to generate history graphs.

GrizzlyAK · June 16, 2020, 10:25pm

I’ve been thinking about moving my HA from the SD to a USB drive (either SSD or HD), but know that’s a one way trip for the RPi. You haven’t taken that jump, have you? Would be curious to hear from anyone who has and how it went. I too fear of the SD card crash.

GrizzlyAK · June 16, 2020, 10:27pm

Yeah, wouldn’t it be nice to exclude by node_id!

KDM · September 29, 2020, 9:28am

How bizarre that I have earned the “Popular Link” Badge due to this link being clicked 50 times. It must have been by Bots. None of YOU would click this link, which evidently is not a link!!

dmcentire · April 28, 2022, 4:03pm

quick question here - I am fairly new to HA and had been playing around with changing the recorder’s history period to a year, but I realize that’s not good as the backups are now over 500MB, so I changed it to 90 days but it didn’t seem to change the backup size.

This tells me that there’s historical (>90 days) of data that will need to get purged, and after some research folks say to delete the DB and start over, but I’d rather just purge entries older than 90 days.

Is there some simple SQL command I can run to purge this data? What do folks use as an interface? I’m happy with just dropping in to an sql client/console and purging that way but am unclear how to get there. I don’t even know which DB I’m running now:

$ ps aux | grep sql from the terminal yields nothing running.

boheme61 · April 28, 2022, 5:24pm

@dmcentire
Go to /developer-tools/service , choose… Recorder: Purge … remember to Mark Repack and the “toogle” to the right of repack

dmcentire · April 28, 2022, 5:34pm

Thanks for the info – just to clarify, there’s also a setting for “Days to keep” that I’m tempted to turn on and set to 90 days.

Will that also be needed as I don’t want to lose the last 90 days of data if possible?

boheme61 · April 28, 2022, 5:37pm

yes i assumed youd noticed that

dmcentire · April 28, 2022, 5:51pm

Cool, thanks for the info and help!

I don’t know exactly how to tell when it’s completed though, in case there’s somewhere I can check.

Currently, I’ve got a terminal into the system (Rpi4) and check uptime every now and then to see how the load averages are.

boheme61 · April 28, 2022, 5:56pm

ok, i don’t know how large your DB was, before you Purged/repack , but if it was 500MB as you mention above, it should be done now, look at the DB-size in /config
PS: In Core-log you see the service called, but for some reason, they don’t find it relevant to “note/log” whether it was successful …
EDIT2: It’s an SQLite DB, so it’s a fairly easy “task” to remove everything after i.e 90 days, then “repack” basically just “rewrite size”
EDIT3: And “Recorder” is the service that writes to the DB-File