Taming the Recorder

Tags: #<Tag:0x00007f3264326e30>

I run occasionally into an excessive database file size. Grooming the community provided a wealth of insights in scattered places. This what I learned while taming the recorder.

Problem Description: I operate a Raspberry Pie 3 with Home Assistant OS, build-in database sqlite, managing 1000+ entities. I run into an excessive use of the Recorder while performing some tests, resulting in a >1 GB file size for home-assistant_v2.db. This led to the following problems: HA crashes while compiling ESPhome, and creating a full snapshot is not possible.

To understand the reasons and to find a remedy I made use of the following tools: Template Editor, Sensors (filesize, systemmonitor, sql, infludb), Plug-Ins (Samba Server, SQLite Web Plug-In, InfluxDB Plug-In), Linux tools (top, du), and Win10 tools (sqlite3_analyzer.exe).

I learned to differentiate between three time frames: runtime, short term recording, and log term storage.

To understand the runtime load, I measured the number of states and events with the template editor ( in my case the result was 1066 and 437):

Current number of events= {{states | count}}, and states= {{states.sensor | count }}

short term recording. My recorder is set to purge every day, thus storing every not filtered state change and event for at least 24 hours. The SQLite Web plug-in gives answers about the load: The first two queries return the total number of states and events.

select count('*') from states  
select count('*') from events  

To identify the most load consuming entity_id:

SELECT entity_id, COUNT(*) as count FROM states GROUP BY entity_id ORDER BY count DESC LIMIT 10;

A list of all monitored sensors:

select distinct entity_id from states WHERE domain = 'sensor'  order by entity_id   

Show the latest 10 entries for a specific sensor. Looking at the difference between timestamps, one can determine its updating interval.

select state, created from states WHERE entity_id = 'sensor.ferres_timestamp_delta' ORDER by state_id DESC LIMIT 10;

long term monitoring : I found influxDB very useful. The following experiment compares the internal load (purple) and the database file size (blue), covering a period of two days. The peak load was 2M states and 1M events, consuming 1.4GB space. Two morning purges (at 4:00) can be seen. However, this does not reduce the size of the data base. Only a service call recorder.purge with the option repack: true reduced the size to 0.3 MB (at 11:00).


My conclusions:

  • Separate short and long time requirements. Deploy different database machines.

  • Identify the most harmful recorder contributors in both states and events tables with respect to the number of involved entities, their update frequency, and keep duration.
    This led me to follow the following strategy:
    a) Reduce the recorder candidates by very stringent whitelist filtering,
    b) Reduce the update frequency of sensors at the origin (workaround: automate saving sensor data to an input_number entity at suitable intervals),
    c) Reduce the recorder keep time to a minimum.

  • Erase intermediate recordings (i.e. for test purposes) manually with purge/repack.

  • Consider using notify (platform file) as an alternative.

  • Keep a fresh database for later usage. In a emergency, replace databases with file system commands, while HA is frozen under the system call homeassistant.stop.

[edit] I ended up reducing the initial database size by 99%, without loosing any of my desired data.

1 Like

Great post! This is something I also wanted to dive into. However, I’m using MariaDB and don’t really have a lot of experience using SQL. Do you have any advise on how to tackle this problem in MariaDB?

Try wrapping the database commands into a sql sensor. Or use a CLI or web interface to your favorite database.

I’m in agreement. My database size is horrendous. I have 1.6GB database and I’ve set a 5 day purge. I want more than 5-days of history but I think it records entirely too much. I think there is room for improvement. Here’s some working criteria which I came up with based on your research and my experience.

  1. Don’t touch boolean entries
  2. Keep all of today’s, yesterday’s, and the day before’s entries.
  3. Remove redundant, matching-value events after 72 hours.
  4. Remove 1/2 of intermediate entries (defined by min/max over the course of 15m/30m/1h/6h depending on user/sensor settings and activity) of each entry older than 72 hours, each 24 hours, during low processor activity time.
  5. Determine a point (maybe 4 weeks) where the data is too corrupt to be useful.

As an alternative to using influxdb as the long term monitoring solution there is TimescaleDB and the HA custom component LTSS (Custom component: Long Time State Storage (LTSS) utilizing TimescaleDB) for those preferring to rely on an sql-based solution for everything.

Keep a close look at the events recordings. If not excluded, any state change of a sensor produces a new entry in both the events and states table.

The count for all events was 21690. In this case, state_changed accounted for 2/3 of all recorded events.

I decided to exclude state_changed and service_call events.

Does anyone else think logging in general is a weakness in HA?

I tried to start a new thread along those lines, but all I got was a snarky reply suggesting that HA shouldn’t try to be easy to use, and that beginners aren’t wanted here.

Thanks for the help man! I’m going to post my journey for those who are also SQL-limited as I am.

I use the MariaDB plugin for Home Assistant. Since I didn’t really know how to connect to my database, I wanted to find a nice GUI-capable tool. For this, I have used dbForge.

Stepwise:

  • download and install dbForge.
  • Go to the MariaDB config page, and in the Network section, enter your port number. In my case the Container column reads 3306/tcp and I have entered 3306 below Host.
  • Open dbForge and use this tutorial to enter your database details
    • Use type TCP/IP
    • As Host enter your HA IP address and as port enter the port you used (3306 in my case)
    • As User and Password, use the entries in your MariaDB plugin config under logins
    • As Database, use the database name set in your MariaDB plugin config (default is homeassistant)
    • Press Test Connection or Connect if you’re feeling lucky!
  • Press ctrl+n to start a new SQL
  • Enter the commands stated above by @heinrich

Some useful commands I have used:
To find out which entities use the most data

SELECT entity_id, COUNT(*) as count FROM states GROUP BY entity_id ORDER BY count DESC LIMIT 100;

To remove entities from the database directly using regular expressions:

-- first test if the regular expression works. I'm looking for e.g.: sensor.blitzwolf_NUC_energy_voltage
SELECT entity_id, COUNT(*) as count FROM states WHERE entity_id LIKE 'sensor.blitzwolf%status' GROUP BY entity_id ORDER BY count DESC LIMIT 10;
-- then remove the entities. This is final!
DELETE FROM states WHERE entity_id LIKE 'sensor.blitzwolf%energy\_voltage';

To find out how much data each table (I think it’s called a table) uses (credit goes to mbuscher)

SELECT
    table_name AS `Table Name`,
	table_rows AS `Row Count`,
	ROUND(SUM(data_length)/(1024*1024*1024), 3) AS `Table Size [GB]`,
	ROUND(SUM(index_length)/(1024*1024*1024), 3) AS `Index Size [GB]`,
	ROUND(SUM(data_length+index_length)/(1024*1024*1024), 3) `Total Size [GB]`
FROM information_schema.TABLES
WHERE table_schema = 'homeassistant'
GROUP BY table_name
ORDER BY table_name 

Another thing I found useful was to plot the first 1000 entities of the first query using Excel and then calculate the sum of all counts up until that entity. That way I found out I could reduce the size of my database by a factor of 10, simply by removing the first 100 entities from the database.