Preamble: @tom_l is the original author of this community guide. He proposed an analytical approach to decide for each entity_id whether to include or exclude it from the underlying database. @mbuscher suggested to go after the entities most often updated in the database, and exclude only those. Both approaches are now included in this community guide.
Target audience: anyone who uses an installation on flash memory like a Raspberry Pi with an SD card or an SSD drive, or anyone who wishes to reduce the database size for better performance.
Flash memory can process a finite number of writes before it fails. As such it is desirable to limit frequency and size of the stored data.
Home Assistant stores a lot of data about events (example: light.office just turned on) and state changes (example: sun.sun changed to below_horizon). This allows it, for example, to produce nice 24-hour graphs of a temperature sensor.
However, a lot of the data stored by default is unlikely to benefit the users. For example, @firstof9 reports that in their installation one single automation entity recorded 416,328 state changes.
Luckily, the wonderful developers of Home Assistant gave us tools to enumerate the situation and to control what gets written to and read from the database.
- Control over what gets written to the database is managed using the Recorder component. Here we can exclude or include entire domains (sensor, switch, cover, etc.) or individual entities (sun.sun, sensor.temperature, etc.) or exclude event types (state changes, etc.)
- We can also control which entities are read from the database when selecting the Logbook or History components. This is done with the same syntax used for Recorder - except for event type - so that we have to define the rules once, only.
There are various approaches we can take to populate the respective configuration files.
- An analytical approach that extracts all entities and then makes a judgement on whether to include or exclude each entity_id or its domain
- A good-enough approach that extracts a list of ‘worst offenders’ to exclude
1. Analytical approach
First let’s get a list of all your entities.
There are two options for this, using the file notify platform and a script, or using the Developer Tools Template Editor. You may not be able to select all the required text with the Template Editor and the results need to be edited before they can be pasted back into your configuration. However it is quicker to set up. The script and file notification method takes a little more set-up but the results can be easily pasted back into your configuration and will work where selecting all the text in the Developer Tools Template Editor results is not possible.
Template Editor Method (easiest, try this first):
Click to Expand
Go to the template tab in the Developer Tools tab, and enter in this template:
=========================
{% for state in states %}
- {{ state.entity_id -}}
{% endfor %}
Copy the result on the right, and open it in your preferred text editor. You should see something like this:
=========================
- alarm_control_panel.house
- automation.7pm_abc_news_on_tv
- automation.abb_usage_alert
- automation.acrc_reboot
- automation.addon_update_available
- automation.alarm_away_mode_armed
- automation.alarm_away_mode_arming
- automation.alarm_disarmed
- automation.alarm_triggered
- automation.alarm_warning
- automation.allways_on_device_alert
…etc
Script and File Notification Method (more complex setup, only use if the template method fails):
Click to expand
Create this notification integration:
- platform: file
name: entity_log
filename: /config/www/entity_log.txt
timestamp: false
and this script:
print_entities_to_file:
alias: 'Print Entities To File'
sequence:
- service: notify.entity_log
data_template:
message: >
{% for state in states %}
{{ state.entity_id -}}
{% endfor %}
If you don’t have a config/www
directory. Create one and restart.
Then run the script from the Developer tools / Services menu.
The script will create a text file containing all your entities in a format that can later be cut and pasted into your Recorder, Logbook, and History integrations. You can find the file here /config/www/entity_log.txt .
Open the file in your preferred text editor. You should see something like this:
Home Assistant notifications (Log started: 2020-06-15T16:25:14.430437+00:00)
--------------------------------------------------------------------------------
-alarm_control_panel.house
-automation.7pm_abc_news_on_tv
-automation.abb_usage_alert
-automation.acrc_reboot
-automation.addon_update_available
-automation.alarm_away_mode_armed
-automation.alarm_away_mode_arming
-automation.alarm_disarmed
-automation.alarm_triggered
-automation.alarm_warning
-automation.allways_on_device_alert
...etc
Fix the indentation of the first entity by adding two spaces to line it up with the rest.
Next we want to select everything, copy it and paste it into your preferred spreadsheet application. I used the free LibreOfficeCalc program.
Add some headings in the top row, from left to right: | Entity | Recorder | Logbook | History |
It should look something like this:
Now the ‘fun’ part.
Work your way down the Recorder column adding an 'i'
(for include) for all the entities you want to keep a record of or an 'e'
for the entities you wan to exclude. Yes all of them. 1395 in my case. Told you it was going to be ‘fun’.
Do the same for the Logbook and History columns as you go.
Some tips:
- Before starting, spend some time thinking about what you really want to see the history of and how you want to see it.
- If you want to display the entity in History or the Logbook, or as a graph in the frontend (including more info pop-ups), you have to include it in the recorder.
- Some items should definitely be excluded from all three integrations, why would you want to see the history of a
sensor.time
for example? It is just taking up space and reducing the write endurance of your SD card every minute.
I decided to exclude everything from the History panel. I just don’t use it. All the history graphs I want to see are available as graphs in Grafana or my front end. I used a capital 'E'
for the cells in this column to indicate I could exclude entire domains. I can’t remove the History integration altogether because that makes history in the more info pop-ups unavailable. You can still see this information if it is in the recorder integration, even if it is excluded from the History Side Panel, but only if you have the History side panel integration.
For the Logbook I wanted a list of important events, alarm arming, lights turning on, etc… but no script or scene activation. These all got a big 'E'
(for domain Exclude).
For the recorder I only included things I’m interested in seeing the history of (in the logbook or more info pop-ups, or frontend graphs). I thought very hard about sensors I knew to be quite ‘chatty’ with lots of frequent updates.
Once you have marked every entity with three columns of i
, e
or E
, select all four columns (| Entities | Recorder | Logbook | History |
). Find your Data/Sort option in the spreadsheet menu and sort by the Recorder column first then by the entities column (in a single sort operation).
In the recorder column count the number of excludes for each domain. You can do this by selecting all the 'e
's in the recorder column for each domain (they will be sorted by domain) e.g. start with automations. If you look at the status bar at the bottom of the spreadsheet it will show how many cells are selected. Then do this for all the 'i'
marked cells in the Recorder column for automations.
If you have more includes than excludes for a domain, just use the entity exclude option for those entities in your recorder config. You can cut and paste direct from the spreadsheet sorted entities column into most text editors. e.g.
db_url: !secret mariadb_url
purge_keep_days: 7
exclude:
entities:
- automation.deck_automatic_light_brightness
- automation.kitchen_automatic_light_brightness
- automation.lounge_automatic_light_brightness
- automation.lounge_camera_motion
...etc
If you have more excludes than includes for a domain, exclude the domain and include the entities marked 'i'
using cut and paste from the spreadsheet, e.g. I have a number of trackers but am only interested in the history of one:
db_url: !secret mariadb_url
purge_keep_days: 7
exclude:
domains:
- camera
- device_tracker ##### <------------------- Here
- light
- remote
- scene
- script
- sun
- zone
entities:
- automation.deck_automatic_light_brightness
- automation.kitchen_automatic_light_brightness
- automation.lounge_automatic_light_brightness
- automation.lounge_camera_motion
- automation.rainfall_sensor_update
- automation.rumpus_camera_motion
...etc
include:
entities:
- device_tracker.iphonex ##### <------------------- And here
- light.all_bedroom_lights
- light.all_deck_lights
- light.all_front_lights
- light.all_hallway_lights
...etc
Repeat the process for all the domains in the recorder then start on the the Logbook - select all four columns and sort on the logbook column then entities. Then for History - sort by history column and then entities.
Finally decide how much time you really need to store the data for. The default is 10 days. This setting in the recorder, purge_keep_days:
is the one to edit. Halving the time you keep your data for will halve your database size but there will be a required minimum. I have a weekly graph for my energy consumption so I chose seven days.
I have reduced my database by a third, from 1.5GB down to 1GB, over the last few days and it probably has further to go as the old data gets purged. Also I already had quite a few excluded entities before beginning this systematic approach. If you don’t have any excludes before starting this process you may see even greater improvements.
2. Good-enough approach
Instead of making a decision for each and every entity_id, we can look at the ‘worst offenders’ and exclude those, only.
To do so, you’d need SQL access to the database, and then use SQL queries similar to the following:
For the states table:
SELECT
entity_id,
COUNT(*) AS cnt
FROM states
GROUP BY
entity_id
ORDER BY
COUNT(*) DESC;
The events table seems to take quite a bit longer with:
SELECT
JSON_VALUE(event_data, '$.entity_id') AS entity_id,
COUNT(*) AS cnt
FROM events
WHERE
JSON_VALUE(event_data, '$.entity_id') IS NOT NULL
GROUP BY
JSON_VALUE(event_data, '$.entity_id')
ORDER BY
COUNT(*) DESC;
You may need to replace JSON_VALUE
with JSON_EXTRACT
.
There are many ways to execute such queries
- If you are using the default SQLite database, then you can install the SQLite Web add-on from the community store
- If you are using the MariaDB add-on, then you can use phpMyAdmin add-on
- If you have command-line access to your server, then you could issue a mysql query as proposed by @JZhass.
Final Note:
From now on, whenever you create a new entity, part of your work-flow must be to consider which of the three integrations to include or exclude it from. Don’t get slack and let it get out of hand or you will have to do this all again.
One last thing, data I keep longer (up to two years) or for really chatty sensors I store in an Influxdb database. But that’s another story guide…