How to keep your recorder database size under control

Same boat here. DB is 75GB and growing. Running Recorder: Purge (7 days) doesn’t seems to work. I’ve excluded several entities/domains that were heavy hitters, but still can’t bring down the DB size. Thoughts?

image

Exclude more entities and events.

I think you have to restart HA for the excludes to get picked up.

The best option is to actually look at the data to see which entities are spamming the database. There are some SQL statements above which will help with that, although you could also just eyeball it in SQLite DB Browser or whatever.

Although not for everyone, another way to minimize the DB size is to limit what’s saved in the long-term statistics tables. Once you’ve identified entities you don’t want to save LTS for, you can exclude them by changing their state_class in customize.yaml:

sensor.1st_floor_temperature_2:
  state_class: none

Read more about state_class for sensor entities here:

I have, but it’s the purge action that doesn’t seem to work.

is there a TL DR on this? Or possibly a quick fix i can apply until having time to read through 4 years of posts?

TL;DR: The database can grow a lot due to the recorder integration, which is enabled by default and which saves a history of the states of each entity. You certainly want a history for certain entities (e.g. temperature), and you probably shouldn’t bother deleting the history of entities that change rarely (e.g. an on/off switch or a doorbell). Then your focus should be to find which entities are the ones contributing most to the large database size, and then excluding those from the recorder integration.

There is no one-size-fits-all, it depends on your installation of Home Assistant and which entities you have. Thus, the first post explains the problem, shows some tools and queries to get enough data, and then explains what you have to configure yourself to exclude them.

TL;DR 2.0: Start with the first post. It is not “short”, but it tries to bring you on a journey to help you identify the root cause of your large database and to help you keep it under control. If things don’t work, then you should seek for help in the for replies.

3 Likes

The problem is not that many of us dont know how to start, but how to control it… many have a large db already and how do you clean and contain that…

1 Like

I totally agree that this whole issue is not well explained to new users. Most people don’t know anything about Recorder until their system crashes or their backups take forever.

By default every single state change is recorded. This is poor design. When creating new entities, users are prompted to decide which area to put them in. They should also be prompted whether they want Recorder to include or exclude them.

Next is the problem that the purge_keep days is a one-size-fits-all setting. Ideally, each entity should have its own retention period. Some things we might want to keep just a few days, others for much longer.

1 Like

And on the topic of retention, large-scale enterprise metric collection systems have some retention rules: how long to keep each metric, and what resolution to keep each metric.

For instance, a web server might generate metrics such as how long it takes to send the full response to the client. It could generate those metrics every minute, or every half minute, together with a dozen of other relevant metrics. They can be very helpful when diagnosing outages live. However, their usefulness degrades over time. So, such large-scale systems will have some rules such as:

  • Keep the metrics at half-minute resolution for 1 day.
  • Keep the metrics at one-minute resolution for 1 week.
  • Keep the metrics at five-minute resolution for 1 month.
  • Keep the metrics at fifteen-minute resolution for 6 months.

And those rules will change based on what metric we are collecting.

The same logic can be (and should be) applied to home automation. You probably care if the temperature changed in the past minute, but you don’t care if it changed last month at the exact same minute.

Anyway, this is a useful and interesting discussion, but I believe it is off-topic on this thread.

1 Like

I agree partly: The default recorder setting is 10 days with which you will get very, very far without issues. The problem comes where people start changing that, which means they spend enough time on changing the setting, but not understanding its implications.

Perhaps I’d say the design hasn’t evolved. To add, don’t forget about integrations creating entities too. Often you’re only interested in a couple and people won’t disable what they don’t use (even better than excluding them from the recorder). At least diagnostic sensors aren’t enabled by default, which is a good design decision.

A default, but the ability to override individually, yes. Just remember that some sensors can be a function of others, and if you don’t have the history, those derived sensor may not function correctly. It creates a new set of issues.

Just want to mention that systems like statsd and Grafana already does that and there are ways to feed data to it from HA. I think one must consider the overhead for the HA team to build functionality like that if it exists.

HA’s LTS is a simplified version already by having two levels only.

2 Likes

There are other ways to perform everything HA does. That doesn’t justify ignoring core functionality which every HA user has to contend with. If there is time to add new features, there is time to “keep the lights on” by bringing old components like Recorder up to date.

HA has grown a lot since the architecture for Recorder was established. It might not bring a whole lot of glory, but fixing some glaring flaws which routinely hurt new users wouldn’t be a waste of anyone’s time.

1 Like

There is two Actions (Service Calls) to perform here.
Recorder: Purge entities

  • Entities you want to completely remove from the database or history you want to delete.
  • I always set keep days to 0 for these.
  • I am not interested in their data or have completely removed the entity.
  • Database size will not reduce until you have completed a Recorder: Purge with repack.
  • Select Perform Action and wait a good few minutes, there is no sign that the Action is complete.

Recorder: Purge

Having a graph that monitors the database size is a good way to see if the size has shrunk.


I recently added this link Tracking down sudden increase in database size 2025 to the original post in this thread. It shows visually how to work through the steps outlined at the top of this thread.

Great job btw @denilsonsa you helped me immensely shrinking my 3GB database down to what it is now.

3 Likes

Could you post the code for how you monitor the database size via graph form?

EDIT: Never mind, found the filesize integration information.

2 Likes

using docker installation here, so I can’t use add-ons (sql). Is there any way to use some external db client?

I’m using sqlite-web in a docker container with following docker compose file.

services:
  sqlite-web:
    stdin_open: true
    tty: true
    ports:
      - 8080:8080
    volumes:
      - /home/odroid/docker-volumes/homeassistant/config:/data
    environment:
      - SQLITE_DATABASE=home-assistant_v2.db
    image: vaalacat/sqlite-web:latest
networks: {}
1 Like

Hi! I’ve developed a python script that could help with this; it’s here: https://github.com/naevtamarkus/homeassistant-statistics-cli
If you issue the ‘list’ command you can see how much size does every sensor take from the DB. Purging data is also possible, but the Purge action in the Recorder may be better in the long run. But at least you can know what you need to purge without having to issue SQL commands.
Feedback on the script is welcome!

4 Likes

would be nice to see this converted to an integration. Creating entities with all intereseted stuff and using the db conneciton defined in HA config it self…

just a thought :slight_smile:

3 Likes

I totally agree! Either an integration or an add-on.
I have no experience whatsoever on doing that though, for me the Python and DB stuff was easy to do, but with HA-specific stuff I would not know where to start. Any suggestion (or better, help) would be awesome!

1 Like

maybe some inspiration on this repo

ha appraoch indeed needs some learning curve