How to keep your recorder database size under control

denilsonsa · April 3, 2021, 1:37am

Note: This guide was written for an earlier version of Home Assistant (around 2021). Since then, changes have happened, but this guide hasn’t been updated (yet). Please be aware your Home Assistant version can be slightly different, and thus you need to adapt yourself the instructions from this guide.

Introduction

This is yet another guide on:

How to view the size of your database.
How to figure out which entities are using too much space.
- And how to filter them out.

I’m running Home Assistant on a Raspberry Pi with the default SQLite database. If your setup is different, this guide can still be useful, but you have to adapt to your environment.

Motivation

Why should you care about the database size? Well, there are several reasons…

But first, there is one reason why you should not care: if you are new to Home Assistant, don’t worry about the database size at the beginning. Wait for a couple of weeks, get yourself more familiar with the project, enjoy your new home automation system, then come back to this guide. Bookmark this page, go have fun, and come back later!

After using Home Assistant for several weeks, and after adding many integrations and devices, the recorder database will grow. A lot. The larger the database, the slower Home Assistant will run, because SQLite will have to read/write more data (and more metadata, such as indexes). This will mostly cause excessive I/O, making some parts of the system stall while waiting for reads and writes. Excessive I/O also causes more stress (i.e. more wear) on flash-based storage (such as SD cards), meaning they will degrade faster and they will fail sooner than you might expect.

Additionally, the database is included by default in Home Assistant backups (formerly called snapshots). Smaller backups are faster to create, download and restore; and they won’t fill up your backup storage location. (You’re doing automatic backups, right? If not, that should be your next priority, just after checking the database size.)

Basic concepts

In the default configuration, Home Assistant keeps a history of all the events and states for all the entities, for about 10 days. This data powers the History and the Logbook features. The data itself is stored by the Recorder integration, which by default writes it to a SQLite database file /config/home-assistant_v2.db.

After some investigation, you can figure out that some entities update too often, and a few others have very long attributes; and those entities account for most of the space in the database. Thus, you can filter such entities and that will help trimming the fat out of your database. You can go from hundreds of megabytes, or even from a few gigabytes, to a database under a hundred megabytes. And a smaller database means a faster and more responsive Home Assistant, smaller backups, and also less wear on the SD card where the database is stored.

Also note that you can safely delete that database file, and Home Assistant will recreate an empty one upon starting. You can also safely disable the Recorder integration altogether, as it is only used for historical data, and it is not needed for normal usage.

Viewing the total size of the database

The first step is to measure the size of your database. This way you can see if you have a problem with inflated database, and how large is the problem.

One way to achieve that is by looking at the file size manually, using either SSH (run du -h /config/home-assistant_v2.db) or by Samba folder share.

However, a much simpler and easier way is to configure a sensor that shows the size of the database file. Just make sure you understand the implications of allowlist_external_dirs (and feel free to add a comment below about this). Edit your /config/configuration.yaml file:

homeassistant:
  allowlist_external_dirs:
    - /config
sensor:
  - platform: filesize
    file_paths:
      - /config/home-assistant_v2.db

This sensor will be updated every half minute. Since each state change will add a couple of rows to the recorder database, you may prefer to reduce the update frequency. If you so desire, just add scan_interval: 1800 (for 30-minute interval) to the sensor configuration. (This option can be configured for any integration that uses polling.)

NOTE: the instructions just above are for an older Home Assistant version. Newer versions can add the File Size integration directly , and also require different steps to change the default polling interval.

Then, you can just add sensor.home_assistant_v2_db to your Lovelace dashboard.

Screenshot of the file size entity in Lovelace

Finding the heaviest entities

To find which entities are using too much space, you will have to dig into the database itself and run a SQL query. Fortunately, there is already an add-on to make it easier for you.

Go to the Add-on store:
Install the SQLite Web add-on:
If you want, you can Show in sidebar and also Start on boot. The choice is yours.
Make sure this add-on is started before you continue.

When you open the SQLite Web interface, you can see a few tables, of which only two are relevant:

states contains a log of all the states for all the entities.
events contains a log of all the events. Some events include event_data (such as parameters in call_service events), but state_changed events have empty event_data, because the state data is stored in the other table.

If you are curious to view the size of each table, try this SQL query:

SELECT
  SUM(pgsize) bytes,
  name
FROM dbstat
GROUP BY name
ORDER BY bytes DESC

It will also show ix_* entries, which are not real tables, but just indexes to keys in the actual tables.

Viewing `events` usage

Click on the events table in SQLite Web.
Click on Query.
Type the following SQL query:

-- Updated query Dec 2023
SELECT
  COUNT(*) as cnt,
  COUNT(*) * 100 / (SELECT COUNT(*) FROM events) AS cnt_pct,
  event_types.event_type
FROM events
INNER JOIN event_types ON events.event_type_id = event_types.event_type_id
GROUP BY event_types.event_type
ORDER BY cnt ASC

In my case, I had 10 times state_changed events than call_service, which was 3 times the amount of all remaining events combined. Thus, we can easily conclude these are the main contributors to the size of this table.

NOTE: The space usage in this query is only for the event_data column. It is not a full representation of the row size, because the other columns take up space as well. Don’t pay too much attention to the value.

Viewing `states` usage

Click on the states table in SQLite Web.
Click on Query.
Type the following SQL query:

-- Updated query Dec 2023
SELECT
  COUNT(*) AS cnt,
  COUNT(*) * 100 / (SELECT COUNT(*) FROM states) AS cnt_pct,
  states_meta.entity_id
FROM states
INNER JOIN states_meta ON states.metadata_id=states_meta.metadata_id
GROUP BY states_meta.entity_id
ORDER BY cnt ASC

Analyze the results. Most likely, the top few entities will contribute to a quarter (or maybe half) of the table size (either as number of rows, or as size of the rows).

NOTE: The byte count in this query only considers the attributes column (which is the only column with arbitrary limitless length). It does not mean the actual size of the rows, but instead we can understand it as a lower bound.

Filtering out entities

Now that you know which entities are bloating your database, you can filter them out. Feel free to filter as many or as few as you want. When considering if you should filter it or not, ask yourself: Does the history of the state changes for this entity bring me value? Does it bring me as much value as the size it takes?

Edit your /config/configuration.yaml file, here’s an example from mine:

recorder:
  exclude:
    domains:
      - device_tracker
      - media_player
      - uptime
      - time_date
      - worldclock
    entity_globs:
      - sensor.clock*
      - sensor.date*
      - sensor.glances*
      - sensor.load_*m
      - sensor.time*
      - sensor.uptime*
    entities:
      - camera.front_door
      - sensor.home_assistant_v2_db
      - sensor.memory_free
      - sensor.memory_use
      - sensor.memory_use_percent
      - sensor.processor_use
      - weather.openweathermap

The Recorder documentation explains how to filter entities or events, please look over there for details.

Please restart your Home Assistant after editing the configuration file. It may also take some time before you actually see the database file size decreasing, as the Recorder integration has to purge the database, and the SQLite itself might need to do some internal work to reallocate the data in the file. ~~If you can’t wait, or~~ if the database size doesn’t go down, try calling the recorder.purge service with the repack: true parameter. If you want to go to the extreme, feel free to delete the database file, and thus restarting Home Assistant with a clean empty history.

Note: I guess the database file size will never go down, unless you purge it with the repack parameter. I have not confirmed this, and I don’t have enough experience with SQLite to be sure about it.

More aggressive configuration

You can also change purge_keep_days from 10 days to 1 or 2. There are no adverse effects, other than forgetting history sooner.

And if you want to reduce the frequency of file writes, increase commit_interval from the default 1 second to 5, 10, 15, 30 or 60 seconds (or any other interval you may like). This may help prolonging the lifespan of the SD card with minimal side-effect (the most recent history and logbook entries will take longer to show up).

If you have entities where you would like to keep less history than others, you can target their purging with a scheduled automation. More information on this is in the documentation [0].

  - service: recorder.purge_entities
    data:
      keep_days: 5
    target:
      entity_id: sensor.power_sensor_0

Conclusion

I was having trouble with my Home Assistant instance, and it was related to the database file being over 500MB, getting close to 1GB. With the changes mentioned in this guide, the database is now under 35MB, and Home Assistant is much more stable.

Also, it is ironic that the sensor to monitor the database file size contributes heavily to increasing its file size.

Guide Contributors

@denilsonsa (original)
@bbrendon

Resources

Further reading (links mostly from this post):

CaptTom · April 3, 2021, 11:32am

Very nice write-up, thank you!!

This should be required reading for anyone following the “getting started” instructions for installing HA on a Raspberry Pi and SD card.

My own database was hitting 3GB before I stumbled on the recorder exclude settings.

tosion · April 3, 2021, 11:35am

Thanks mate - this was excellent post! I immediately improved my database a lot with your help.

I found one tricky problem for me. I have many TP-link smart plugs which have many attributes. They seem to eat a lot of database space. On the other hand I want to keep the state (on/off) of the plugs in my database. But because of energy monitoring to store the “switch.xyz” to database eats a lot space as the voltage varies a lot:

Any idea how to solve this problem, these smart plugs are the “worst” after the database size-sensor - do I need to copy the state to input boolean and use that as a state indicator or how?

samnewman86 · April 3, 2021, 11:46am

Filtering things like media player made a big difference for me.
At one point my dB was around 4gb, now it’s around 200mb.

CaptTom · April 3, 2021, 7:30pm

I have a few of these TP-Link devices, too. I’ve created entities for the values I care about, and exclude the switch entity they’re pulled from. For example, I have a couple of sump pumps and pull out a few attributes from each like this:

  - platform: template
    sensors:
      switch_east_sump_pump_amps:
        friendly_name_template: "East Sump Pump Current"
        value_template: "{{ state_attr('switch.east_sump_pump', 'current_a') | float}}"
        unit_of_measurement: 'A'
      switch_east_sump_pump_volts:
        friendly_name_template: "East Sump Pump Voltage"
        value_template: "{{ state_attr('switch.east_sump_pump', 'voltage') | float}}"
        unit_of_measurement: 'V'
      switch_east_sump_pump_today_kwh:
        friendly_name_template: "East Sump Pump Today's Consumption"
        value_template: "{{ state_attr('switch.east_sump_pump', 'today_energy_kwh') | float}}"
        unit_of_measurement: 'kWh'

Now I can exclude the switch they’re pulled from:

recorder:
  purge_keep_days: 7
  commit_interval: 5
  exclude:
    entities:
      - switch.east_sump_pump

I actually exclude some of the templated entities, too, but you get the idea.

I do the same thing with the weather entities and various other sensor entities which tend to generate bloated attribute lists. My exclude list is pretty long. Some folks go the other way and just include the things they know they want recorder to store. For me this was a bit easier and has been very effective, but I could argue for either approach.

tosion · April 3, 2021, 8:29pm

Thanks for your response!

I pretty much do the very same thing as you but my problem is how to keep the simple on/off-state in database. If you exclude the switch.xyz you lose the state (on/off) - don’t you?

As I wrote: “I want to keep the state (on/off) of the plugs in my database”

You probably can create a binary_sensor for that also but how is the question?

CaptTom · April 3, 2021, 9:39pm

Hmmm. I haven’t actually done that. But according to this documentation it’s possible:

Look at the section “Switch as a Sensor.” I think that’s the closest to what you want.

tosion · April 4, 2021, 7:04am

Thanks @CaptTom!

binary_sensor:
  - platform: template
    sensors:
      m_heat1:
        device_class: plug
        value_template: "{{ is_state('switch.m_heat1', 'on') }}"

It works

denilsonsa · April 5, 2021, 12:32am

Many people have used InfluxDB for long-term storage of many values from many sensors. Thus, if you really care about the history but don’t want to bloat the SQLite database, you can try looking into that. Or replacing the SQLite database with a robust MariaDB or Postgres instance (probably running on another more powerful machine with a more reliable storage).

tosion · April 13, 2021, 8:04am

Thanks for your effort to educate us!

I have couple observations which makes me a bit confused. I first excluded database size sensor (in two quite similar setups; HAOS/NUC). Then added it again with half hour scan interval.

  - platform: filesize
    file_paths:
      - /config/home-assistant_v2.db
    scan_interval: 1800

In another system I see “stairs” of couple hundred KB in every 1800 seconds.

In another one I see just a “flat” result in size.

So two questions:

Why just part of the day I see these updates of the sensor in another system?
Overall I would suppose the size of database change a bit all day long - like the other one has been around 2 days 51.54 MB. At least after every purge I should see smaller database - I do not see that either.

And Recorder is:

recorder:
  purge_keep_days: 3
  commit_interval: 2

denilsonsa · April 18, 2021, 11:51pm

TL;DR: I guess… The file size only shrinks on repack: true command. Thus, it seems normal that it keeps growing until 04:12 AM, then it stays mostly stable for the next days.

It seems the file size slowly increased throughout the day until the maximum of 43MB, and then at around 4AM it stopped growing. Meanwhile, on the second graph it is stable and never shrinks.

I’m not 100% sure, but I think I know what happens. The recorder integration auto-purges the database at 04:12. This explains why the database stopped growing around that time.

Additionally, I think this “auto-purge” is a simple DELETE FROM statement that will remove rows from the table. I also think database systems (such as SQLite) tend to rarely shrink their file sizes unless explicitly told so. Let’s imagine how the rows are stored internally by the database. It certainly has some index structures pointing to byte-offsets where the actual user data is stored. Then, if someone deletes a few rows, the system will remove those from the index, and possibly write zeroes over the old data. However, for performance, the system will not move the remaining rows (as that would mean rewriting a lot of data and a lot of metadata); instead it will just leave the deleted space as “empty” and available for later use. Getting rid of this empty space requires rewriting several rows and indexes (essentially, as much work as re-populating the entire table from scratch).

Thus, in order for the database file size to shrink, a slower cleaning-up procedure has to be executed. Since this can cause slowdowns (specially for larger databases) and a lot of I/O, it is not executed automatically. You can still ask for it using repack: true:

When using SQLite or PostgreSQL this will rewrite the entire database. When using MySQL or MariaDB it will optimize or recreate the events and states tables. This is a heavy operation that can cause slowdowns and increased disk space usage while it runs. Only supported by SQLite, PostgreSQL, MySQL and MariaDB.

If my understanding is correct, then I’m curious to know how the graph looked like on the following days. I assume the file size was growing on the first day (the “stairs” you cited), and then stabilized around that size for the next days (possibly with occasional growing near the end of each day).

tosion · April 19, 2021, 10:13am

Thanks again!

Actually graph had those “stairs” couple days until it stabilized totally. So what you explained makes sense!

CChris · November 12, 2021, 1:03pm

Hi there,

first of all, sorry for creating a new reply to this old topic…
I have changed my recorder to a MariaDB running on a different server than HomeAssistant.
Now, after a few days - the DB size has grown a bit over 1 GB of data - this shouldn’t be yet an issue for MariaDB, but honestly speaking - I don’t need much history for most of my sensors and integration.

Sure, I could use “purge” to cleanup the database, but since the Energy Dashboard is using the database for its statistics as well, I am not sure, if purge will have an impact on these statistics, too.

I want to keep as much data as possible for the energy dashboard (long term statistics)…
Is there a way to specify which statistics could be deleted and which not?
for example, there are many sensors available in the Database (statistics and shortterm_statistics) which I don’t really need there…

So in short, I do want to optimize the Database / Recorder options, but I am a bit unsure, what is required and what not (mostly in terms of the energy dashboard and just a few other sensors)

CaptTom · November 12, 2021, 1:26pm

I think the first post in this thread explains it pretty well. IMHO the first five or so should be part of the HA “Getting Started” documentation. And if you don’t want to mess with SQL, there’s always the “History” page which can give you an idea of which entities are spamming the database unnecessarily and should be excluded.

Assuming you’ve read those and excluded the things you don’t want to keep, that’s about all HA can do for you. It’s downright embarrassing that a great system like HA has such a poorly designed database, and no way to specify retention times for individual entities. You either keep them all for the full keep_days or exclude them. Obviously this is a pretty blunt instrument.

I did post a Feature Request requesting retention periods by entity. Feel free to up-vote if you agree.

denilsonsa · November 12, 2021, 2:30pm

@CaptTom’s response is pretty accurate. In summary, @CChris, you have a few options:

Set a short purge_keep_days that applies to all entities. (This may impact the history of the other entities you want to record for a long time.)
Leave purge_keep_days long enough and accept your database will become very large. (Shouldn’t be an issue, except for the feeling that you are wasting storage space that could be used for something else.)
Leave purge_keep_days long enough and filter other entities, so those won’t have any history.
Leave purge_keep_days long enough and add a cronjob or a script somewhere that will periodically run a DELETE FROM … WHERE … SQL query on your database, cleaning up older items for entities you don’t care as much. (If you do this, please post your query.)
Keep the recorder database small, and use a second storage for long-term history, such as InfluxDB, Graphite, Prometheus. (Those solutions are optimized for time-series data with lots of writes, but setting them up is more complicated than just a single database.)
Hope that CaptTom’s feature request will ever be implemented. (Or help implement it yourself.)

All of these solutions have some drawbacks, and unfortunately there no simple solution that would work for most people. I’m currently using solution 1, I can’t do solution 2, I thought about solution 5 but that means more maintenance and more moving parts that can fail. I’m willing to adopt solution 4, but so far I haven’t invested time on it (also because long-term history is currently just a “nice-to-have” feature for me).

CChris · November 12, 2021, 2:35pm

all right, thanks @denilsonsa and @CaptTom.
so my thoughts are correct, that purge will also remove the statistics for the energy dashboard…

then I will have to consider some additional methods - aside from homeassistant to do the job…
I am already using influx for some measurements, but I want to keep the states in the energy dashboard, which isn’t possible with influxdb as far as i know…

CaptTom · November 12, 2021, 3:08pm

I use a sort of hybrid solution, starting with @denilsonsa’s Option 1.

I’ve excluded everything I don’t care about (biggest impact on DB size) and set purge_keep_days to 7. I could probably go to 4 or 5 without really missing anything.

But I also record some long-term data to flat text files using notify: / platform: file. I log every start/stop of my heating system’s boiler (helps me estimate fuel used) and I log daily and monthly run-time data for the various heating and cooling zones.

I figure that I’m going to want to analyze and manipulate the data in a bunch of different ways that a database (especially, a poorly-designed one) just isn’t ideal for.

CChris · November 12, 2021, 6:31pm

thanks a lot
Probably, I am going to keep the data I want to keep within the database.

Having them in my MariaDB would also allow me to access the data with anything else rather than HomeAssistant…
But first, I need to decide, what Data I really need and what could be deleted… this will probably take some time

basically, I just wanted to know, if purge does have an effect to the statistics for the energy dashboard (which isn’t really mentioned in the documentation —> probably should be [?])

this will also allwo me to define the exclusion list more specific, since I know what sensor I have assigned to the dashboard, etc.

CaptTom · November 12, 2021, 8:05pm

Right. This should be a step in the “Getting Started” section of the HA documentation!

Here’s a tip: Exclude any entities which have a bunch of attributes. Use a template to pull out just the one(s) you need, and let Recorder save just those to the database.

Another tip is don’t spend too much time trying to exclude everything you don’t need. Focus on the heavy hitters which update often or with a lot of data. You can query the database and use COUNT to find the worst offenders. Or even use the History page. Then decide which of them to keep and which to eliminate.

malosaa · December 9, 2021, 4:52pm

wow great info,

but does it affect the energy database? i want to keep that info for atleast a year.
i’m scared to use purge and lose the full energy database.
i excluded a lot of things, just want to keep alll the info from the energy dashboard

i’m new here btw