Target audience: anyone who uses an installation on flash memory like a Raspberry Pi with an SD card or an SSD drive, or anyone who wishes to reduce the database size for better performance.
Flash memory can process a finite number of writes before it fails. As such it is desirable to limit frequency and size of the stored data.
Home Assistant stores a lot of data about events (example: light.office just turned on) and state changes (example: sun.sun changed to below_horizon). This allows it, for example, to produce nice 24-hour graphs of a temperature sensor.
However, a lot of the data stored by default is unlikely to benefit the users. For example, @firstof9 reports that in their installation one single automation entity recorded 416,328 state changes.
Luckily, the wonderful developers of Home Assistant gave us tools to enumerate the situation and to control what gets written to and read from the database.
- Control over what gets written to the database is managed using the Recorder component. Here we can exclude or include entire domains (sensor, switch, cover, etc.) or individual entities (sun.sun, sensor.temperature, etc.) or exclude event types (state changes, etc.)
- We can also control which entities are read from the database when selecting the Logbook component. This is done with the same syntax used for Recorder - except for event type - so that we have to define the rules once, only.
There are various approaches we can take to populate the respective configuration files.
- For HAOS users an add-on that makes this very easy with automatically generated charts that show you which entities are generating the most data. Or,
- An analytical approach that extracts all entities and then makes a judgement on whether to include or exclude each entity_id or its domain. Or,
- A good-enough approach that extracts a list of āworst offendersā to exclude
1. DBStats Add-on
If you have a Home Assistant OS installation. Install this add-on: Dbstats - Addon for database usage analytics
From the āCount Statesā graph identify your most chatty entities and if you do not use their history in charts in your dashboard then exclude them from the recorder (see later in this topic).
2. Analytical approach
First letās get a list of all your entities.
There are two options for this, using the file notify platform and a script, or using the Developer Tools Template Editor. You may not be able to select all the required text with the Template Editor and the results need to be edited before they can be pasted back into your configuration. However it is quicker to set up. The script and file notification method takes a little more set-up but the results can be easily pasted back into your configuration and will work where selecting all the text in the Developer Tools Template Editor results is not possible.
Template Editor Method (easiest, try this first):
Click to Expand
Go to the template tab in the Developer Tools tab, and enter in this template:
=========================
{% for state in states %}
- {{ state.entity_id -}}
{% endfor %}
Copy the result on the right, and open it in your preferred text editor. You should see something like this:
=========================
- alarm_control_panel.house
- automation.7pm_abc_news_on_tv
- automation.abb_usage_alert
- automation.acrc_reboot
- automation.addon_update_available
- automation.alarm_away_mode_armed
- automation.alarm_away_mode_arming
- automation.alarm_disarmed
- automation.alarm_triggered
- automation.alarm_warning
- automation.allways_on_device_alert
ā¦etc
Script and File Notification Method (more complex setup, only use if the template method fails):
Click to expand
Create this notification integration:
- platform: file
name: entity_log
filename: /config/www/entity_log.txt
timestamp: false
and this script:
print_entities_to_file:
alias: 'Print Entities To File'
sequence:
- service: notify.entity_log
data_template:
message: >
{% for state in states %}
{{ state.entity_id -}}
{% endfor %}
If you donāt have a config/www
directory. Create one and restart.
Then run the script from the Developer tools / Services menu.
The script will create a text file containing all your entities in a format that can later be cut and pasted into your Recorder, Logbook, and History integrations. You can find the file here /config/www/entity_log.txt .
Open the file in your preferred text editor. You should see something like this:
Home Assistant notifications (Log started: 2020-06-15T16:25:14.430437+00:00)
--------------------------------------------------------------------------------
-alarm_control_panel.house
-automation.7pm_abc_news_on_tv
-automation.abb_usage_alert
-automation.acrc_reboot
-automation.addon_update_available
-automation.alarm_away_mode_armed
-automation.alarm_away_mode_arming
-automation.alarm_disarmed
-automation.alarm_triggered
-automation.alarm_warning
-automation.allways_on_device_alert
...etc
Fix the indentation of the first entity by adding two spaces to line it up with the rest.
Next we want to select everything, copy it and paste it into your preferred spreadsheet application. I used the free LibreOfficeCalc program.
Add some headings in the top row, from left to right: | Entity | Recorder | Logbook
It should look something like this:
Ignore the History column, that is no longer needed.
Now the āfunā part.
Work your way down the Recorder column adding an 'i'
(for include) for all the entities you want to keep a record of or an 'e'
for the entities you wan to exclude. Yes all of them. 1395 in my case. Told you it was going to be āfunā.
Do the same for the Logbook column as you go.
Some tips:
- Before starting, spend some time thinking about what you really want to see the history of and how you want to see it.
- If you want to display the entity in the Logbook, or as a graph in the frontend (including more info pop-ups), you have to include it in the recorder.
For the Logbook I wanted a list of important events, alarm arming, lights turning on, etcā¦ but no script or scene activation. These all got a big 'E'
(for domain Exclude).
For the recorder I only included things Iām interested in seeing the history of (in the logbook or more info pop-ups, or frontend graphs). I thought very hard about sensors I knew to be quite āchattyā with lots of frequent updates.
Once you have marked every entity with three columns of i
, e
or E
, select all three columns (| Entities | Recorder | Logbook |
). Find your Data/Sort option in the spreadsheet menu and sort by the Recorder column first then by the entities column (in a single sort operation).
In the recorder column count the number of excludes for each domain. You can do this by selecting all the 'e
ās in the recorder column for each domain (they will be sorted by domain) e.g. start with automations. If you look at the status bar at the bottom of the spreadsheet it will show how many cells are selected. Then do this for all the 'i'
marked cells in the Recorder column for automations.
If you have more includes than excludes for a domain, just use the entity exclude option for those entities in your recorder config. You can cut and paste direct from the spreadsheet sorted entities column into most text editors. e.g.
db_url: !secret mariadb_url
purge_keep_days: 7
exclude:
entities:
- automation.deck_automatic_light_brightness
- automation.kitchen_automatic_light_brightness
- automation.lounge_automatic_light_brightness
- automation.lounge_camera_motion
...etc
If you have more excludes than includes for a domain, exclude the domain and include the entities marked 'i'
using cut and paste from the spreadsheet, e.g. I have a number of trackers but am only interested in the history of one:
db_url: !secret mariadb_url
purge_keep_days: 7
exclude:
domains:
- camera
- device_tracker ##### <------------------- Here
- light
- remote
- scene
- script
- sun
- zone
entities:
- automation.deck_automatic_light_brightness
- automation.kitchen_automatic_light_brightness
- automation.lounge_automatic_light_brightness
- automation.lounge_camera_motion
- automation.rainfall_sensor_update
- automation.rumpus_camera_motion
...etc
include:
entities:
- device_tracker.iphonex ##### <------------------- And here
- light.all_bedroom_lights
- light.all_deck_lights
- light.all_front_lights
- light.all_hallway_lights
...etc
Repeat the process for all the domains in the recorder then start on the the Logbook - select all four columns and sort on the logbook column then entities.
Finally decide how much time you really need to store the data for. The default is 10 days. This setting in the recorder, purge_keep_days:
is the one to edit. Halving the time you keep your data for will halve your database size but there will be a required minimum. I have a weekly graph for my energy consumption so I chose seven days.
I have reduced my database by a third, from 1.5GB down to 1GB, over the last few days and it probably has further to go as the old data gets purged. Also I already had quite a few excluded entities before beginning this systematic approach. If you donāt have any excludes before starting this process you may see even greater improvements.
3. Good-enough approach
Instead of making a decision for each and every entity_id, we can look at the āworst offendersā and exclude those, only.
To do so, youād need SQL access to the database, and then use SQL queries similar to the following:
For the states table:
SELECT
entity_id,
COUNT(*) AS cnt
FROM states
GROUP BY
entity_id
ORDER BY
COUNT(*) DESC;
The events table seems to take quite a bit longer with:
SELECT
JSON_VALUE(event_data, '$.entity_id') AS entity_id,
COUNT(*) AS cnt
FROM events
WHERE
JSON_VALUE(event_data, '$.entity_id') IS NOT NULL
GROUP BY
JSON_VALUE(event_data, '$.entity_id')
ORDER BY
COUNT(*) DESC;
You may need to replace JSON_VALUE
with JSON_EXTRACT
.
There are many ways to execute such queries
- If you are using the default SQLite database, then you can install the SQLite Web add-on from the community store
- If you are using the MariaDB add-on, then you can use phpMyAdmin add-on
- If you have command-line access to your server, then you could issue a mysql query as proposed by @JZhass.
Final Note:
From now on, whenever you create a new entity, part of your work-flow must be to consider which of the three integrations to include or exclude it from. Donāt get slack and let it get out of hand or you will have to do this all again.
One last thing, data I keep longer (up to two years) or for really chatty sensors I store in an Influxdb database. But thatās another story guideā¦