How to reduce your database size and extend the life of your SD card

Preamble: @tom_l is the original author of this community guide. He proposed an analytical approach to decide for each entity_id whether to include or exclude it from the underlying database. @mbuscher suggested to go after the entities most often updated in the database, and exclude only those. Both approaches are now included in this community guide.

Target audience: anyone who uses an installation on flash memory like a Raspberry Pi with an SD card or an SSD drive, or anyone who wishes to reduce the database size for better performance.

Flash memory can process a finite number of writes before it fails. As such it is desirable to limit frequency and size of the stored data.

Home Assistant stores a lot of data about events (example: light.office just turned on) and state changes (example: sun.sun changed to below_horizon). This allows it, for example, to produce nice 24-hour graphs of a temperature sensor.

However, a lot of the data stored by default is unlikely to benefit the users. For example, @firstof9 reports that in their installation one single automation entity recorded 416,328 state changes.

Luckily, the wonderful developers of Home Assistant gave us tools to enumerate the situation and to control what gets written to and read from the database.

  • Control over what gets written to the database is managed using the Recorder component. Here we can exclude or include entire domains (sensor, switch, cover, etc.) or individual entities (sun.sun, sensor.temperature, etc.) or exclude event types (state changes, etc.)
  • We can also control which entities are read from the database when selecting the Logbook or History components. This is done with the same syntax used for Recorder - except for event type - so that we have to define the rules once, only.

There are various approaches we can take to populate the respective configuration files.

  • An analytical approach that extracts all entities and then makes a judgement on whether to include or exclude each entity_id or its domain
  • A good-enough approach that extracts a list of ‘worst offenders’ to exclude

1. Analytical approach

First let’s get a list of all your entities.

There are two options for this, using the file notify platform and a script, or using the Developer Tools Template Editor. You may not be able to select all the required text with the Template Editor and the results need to be edited before they can be pasted back into your configuration. However it is quicker to set up. The script and file notification method takes a little more set-up but the results can be easily pasted back into your configuration and will work where selecting all the text in the Developer Tools Template Editor results is not possible.

Template Editor Method (easiest, try this first):

Click to Expand

Go to the template tab in the Developer Tools tab, and enter in this template:

=========================
{% for state in states %}
  - {{ state.entity_id -}}
{% endfor %}

Copy the result on the right, and open it in your preferred text editor. You should see something like this:

=========================
  - alarm_control_panel.house
  - automation.7pm_abc_news_on_tv
  - automation.abb_usage_alert
  - automation.acrc_reboot
  - automation.addon_update_available
  - automation.alarm_away_mode_armed
  - automation.alarm_away_mode_arming
  - automation.alarm_disarmed
  - automation.alarm_triggered
  - automation.alarm_warning
  - automation.allways_on_device_alert
…etc

Script and File Notification Method (more complex setup, only use if the template method fails):

Click to expand

Create this notification integration:

- platform: file
  name: entity_log
  filename: /config/www/entity_log.txt
  timestamp: false

and this script:

print_entities_to_file:
  alias: 'Print Entities To File'
  sequence:
  - service: notify.entity_log
    data_template:
      message: >
        {% for state in states %}
        {{ state.entity_id -}}
        {% endfor %}

If you don’t have a config/www directory. Create one and restart.

Then run the script from the Developer tools / Services menu.

The script will create a text file containing all your entities in a format that can later be cut and pasted into your Recorder, Logbook, and History integrations. You can find the file here /config/www/entity_log.txt .

Open the file in your preferred text editor. You should see something like this:

Home Assistant notifications (Log started: 2020-06-15T16:25:14.430437+00:00)
--------------------------------------------------------------------------------
-alarm_control_panel.house
  -automation.7pm_abc_news_on_tv
  -automation.abb_usage_alert
  -automation.acrc_reboot
  -automation.addon_update_available
  -automation.alarm_away_mode_armed
  -automation.alarm_away_mode_arming
  -automation.alarm_disarmed
  -automation.alarm_triggered
  -automation.alarm_warning
  -automation.allways_on_device_alert
  ...etc

Fix the indentation of the first entity by adding two spaces to line it up with the rest.

Next we want to select everything, copy it and paste it into your preferred spreadsheet application. I used the free LibreOfficeCalc program.

Add some headings in the top row, from left to right: | Entity | Recorder | Logbook | History | It should look something like this:

Now the ‘fun’ part.

Work your way down the Recorder column adding an 'i' (for include) for all the entities you want to keep a record of or an 'e' for the entities you wan to exclude. Yes all of them. 1395 in my case. Told you it was going to be ‘fun’.

Do the same for the Logbook and History columns as you go.

Some tips:

  • Before starting, spend some time thinking about what you really want to see the history of and how you want to see it.
  • If you want to display the entity in History or the Logbook, or as a graph in the frontend (including more info pop-ups), you have to include it in the recorder.
  • Some items should definitely be excluded from all three integrations, why would you want to see the history of a sensor.time for example? It is just taking up space and reducing the write endurance of your SD card every minute.

I decided to exclude everything from the History panel. I just don’t use it. All the history graphs I want to see are available as graphs in Grafana or my front end. I used a capital 'E' for the cells in this column to indicate I could exclude entire domains. I can’t remove the History integration altogether because that makes history in the more info pop-ups unavailable. You can still see this information if it is in the recorder integration, even if it is excluded from the History Side Panel, but only if you have the History side panel integration.

For the Logbook I wanted a list of important events, alarm arming, lights turning on, etc… but no script or scene activation. These all got a big 'E' (for domain Exclude).

For the recorder I only included things I’m interested in seeing the history of (in the logbook or more info pop-ups, or frontend graphs). I thought very hard about sensors I knew to be quite ‘chatty’ with lots of frequent updates.

Once you have marked every entity with three columns of i, e or E, select all four columns (| Entities | Recorder | Logbook | History |). Find your Data/Sort option in the spreadsheet menu and sort by the Recorder column first then by the entities column (in a single sort operation).

In the recorder column count the number of excludes for each domain. You can do this by selecting all the 'e's in the recorder column for each domain (they will be sorted by domain) e.g. start with automations. If you look at the status bar at the bottom of the spreadsheet it will show how many cells are selected. Then do this for all the 'i' marked cells in the Recorder column for automations.

If you have more includes than excludes for a domain, just use the entity exclude option for those entities in your recorder config. You can cut and paste direct from the spreadsheet sorted entities column into most text editors. e.g.

db_url: !secret mariadb_url
purge_keep_days: 7
exclude:
  entities:
  - automation.deck_automatic_light_brightness
  - automation.kitchen_automatic_light_brightness
  - automation.lounge_automatic_light_brightness
  - automation.lounge_camera_motion
  ...etc

If you have more excludes than includes for a domain, exclude the domain and include the entities marked 'i' using cut and paste from the spreadsheet, e.g. I have a number of trackers but am only interested in the history of one:

db_url: !secret mariadb_url
purge_keep_days: 7
exclude:
  domains:
  - camera
  - device_tracker        ##### <------------------- Here
  - light
  - remote
  - scene
  - script
  - sun
  - zone
  entities:
  - automation.deck_automatic_light_brightness
  - automation.kitchen_automatic_light_brightness
  - automation.lounge_automatic_light_brightness
  - automation.lounge_camera_motion
  - automation.rainfall_sensor_update
  - automation.rumpus_camera_motion
  ...etc
include:
  entities:
  - device_tracker.iphonex        ##### <------------------- And here
  - light.all_bedroom_lights
  - light.all_deck_lights
  - light.all_front_lights
  - light.all_hallway_lights
  ...etc

Repeat the process for all the domains in the recorder then start on the the Logbook - select all four columns and sort on the logbook column then entities. Then for History - sort by history column and then entities.

Finally decide how much time you really need to store the data for. The default is 10 days. This setting in the recorder, purge_keep_days: is the one to edit. Halving the time you keep your data for will halve your database size but there will be a required minimum. I have a weekly graph for my energy consumption so I chose seven days.

I have reduced my database by a third, from 1.5GB down to 1GB, over the last few days and it probably has further to go as the old data gets purged. Also I already had quite a few excluded entities before beginning this systematic approach. If you don’t have any excludes before starting this process you may see even greater improvements.

cd4a21c1701bb0968109887e8479c29a84df43c5_2_690x237

2. Good-enough approach

Instead of making a decision for each and every entity_id, we can look at the ‘worst offenders’ and exclude those, only.

To do so, you’d need SQL access to the database, and then use SQL queries similar to the following:

For the states table:

SELECT 
    entity_id,
    COUNT(*) AS cnt
FROM states
GROUP BY
    entity_id
ORDER BY
    COUNT(*) DESC;

The events table seems to take quite a bit longer with:

SELECT 
    JSON_VALUE(event_data, '$.entity_id') AS entity_id,
    COUNT(*) AS cnt
FROM events
WHERE
    JSON_VALUE(event_data, '$.entity_id') IS NOT NULL
GROUP BY
    JSON_VALUE(event_data, '$.entity_id')
ORDER BY
    COUNT(*) DESC;

You may need to replace JSON_VALUE with JSON_EXTRACT.

There are many ways to execute such queries

  • If you are using the default SQLite database, then you can install the SQLite Web add-on from the community store
  • If you are using the MariaDB add-on, then you can use phpMyAdmin add-on
  • If you have command-line access to your server, then you could issue a mysql query as proposed by @JZhass.

Final Note:

From now on, whenever you create a new entity, part of your work-flow must be to consider which of the three integrations to include or exclude it from. Don’t get slack and let it get out of hand or you will have to do this all again.

One last thing, data I keep longer (up to two years) or for really chatty sensors I store in an Influxdb database. But that’s another story guide…

85 Likes

Nice guide!

This seems to work for me.

{%- for state in states %}
  {{-'- ' + state.entity_id }}
{% endfor %}

And also you could do it in the Developer Tools, Template page.

The two hyphens, seen in the first line of this template, do the trick.

{%- for state in states -%}
  - {{state.entity_id}}
{% endfor %}

3 Likes

I did try the developer tools but could not select more than a few dozen lines before losing the selection.

I’ll fix up the template tomorrow. Thanks.

EDIT: just tried both 123’s and Klogg’s templates but neither include the required spacing before the hyphen.

Also I am still unable to select more than a page worth of template results in the Template Editor. So the notifier is the way to go.

Thank you for the clear directions. Before I tackle this, does your directions work for the built in SQLite database?
I have not switch to mariaDB as it is not clear to me why it is needed. I have been waiting for the Pi4 to be able to boot from SSD and then I will reconfigure it as well as learn how to use the mariadb. It is close now as the boot from SSD is in beta.

Yes I should have made that clear. The configuration is the same for whatever database you use.

It gives a noticeable performance boost at the expense of using more RAM. Also it is a lot less prone to corruption.

2 Likes

Now to learn how to configure it. Might as well switch while learning your procedure.:grinning:

Pretty certain there’s nothing you can do about that leading whitespace being removed. If you trace the code in file notifier back into the point at which messages are handled in the core notify component you can see that it eventually renders the template using helper.async_render which does this:

So yea, that’s just going to trim the whitespace off the beginning and the end of your message every time. If you want everything to line up you’d be better off trying to remove the leading whitespace from the others. Or just add a dummy line to the top before the for loop with a non-whitespace character to workaround this.

1 Like

That makes it very difficult to cut and paste the results.

Adding two spaces to the first entity is the lesser chore.

Thanks for looking into it.

1 Like

Separate topic, but if your install is supervised on Pi OS you can move everything except the bootloader to an SSD now. I was able to follow this guide for doing so without issue. Would be nice to eliminate the SD Card completely but really just a bonus at this point, I’m effectively running everything off the SSD. Performance has improved drastically and HA is only interacting with the SSD.

There’s a thread on this and other solutions. If you are on HassOS the author later wrote up a guide for getting to a similar setup on that as well.

1 Like

Thanks for this Tom! It helped me clean up my recorder and make sure I got everything.

Now to remember to stay on top of it…

For what its worth, copying and pasting from the developer tools worked for me.

1 Like

I’m surprised no one has suggested this yet:

logbook:
  include:
    domains:
      - sensor
      - cover
      - climate
      - lock
      - binary_sensor
  exclude: !include exclude.yaml

I’ve always had a few extras for the recorder:

recorder:
  purge_keep_days: 14
  db_url: !secret database_url
  include:
    domains:
      - sensor
      - input_boolean
      - input_select
      - input_number
      - input_text
      - input_datetime
      - automation
      - cover
      - climate
      - lock
      - binary_sensor
  exclude:
    entities: !include exclude.yaml
    event_types:
      - service_removed
      - service_executed
      - platform_discovered
      - homeassistant_start
      - homeassistant_stop
      - feedreader
      - service_registered
      - call_service
      - component_loaded
      - logbook_entry
      - system_log_event
      - automation_triggered
      - script_started
      - timer_out_of_sync

I actually have it like this:

logbook: !include logbook.yaml

That way all my includes and excludes are in the same place.

Same for history and the recorder.

I usually exclude the same devices from log/recorder/history. I’ve also trimmed out some event_types from recorder as recommended long long ago.

The events table is now the largest part of my database. I’ll have to look into the effect of some of those excludes though. For example does excluding automation_triggered prevent you from using the automation’s last triggered property?

last_triggered: '2020-06-15T07:45:49.070321+00:00'
id: '1559915283056'
friendly_name: Auto Kitchen Lights Off

Looks fine to me.

1 Like

Is that basically every event besides state_changed? Did you get that list from investigating your own DB or is there a comprehensive list of all the native events somewhere? Always wondered about that but haven’t managed to find one yet.

Also do you use logbook? Seems like stripping out events like call_service, automation_triggered and script_started would hide all those events from it unless I’m misunderstanding what data logbook is displaying.

Thanks. Boot from ssd is in beta so I am going to wait for it’s release. Should be soon so for me it doesn’t make sense to start using the Bootloader method.

I’m not sure, it was a list from a very old thread, on how to lower the size of your database.

Not really, I should probably disable it at some point, but you are correct those events do not show in the logbook.

1 Like

I got the mariaDB up and running and wanted to try your guide. Where do you create the notification integration? I thought in the configuration.yaml file, but it fails here.
Thanks.