Recorder Database considerations

In my living, the family relies upon HA for many things that run automatic or are dependent from smart-home technology. For this reason, stability is all that counts over everything else. On the other hand, HA is fantastic to experiment with and expand its functionallity. I quickly learned not to experiment with the HA instance that is in service and got myself a second setup to play with (Raspi 4 - 4Gb with ConbeeII, SSD and some secondary stuff).

The two prime considerations to achieve full reliability are the ever expanding Home Assistant recorder database (home-assistant_v2.db) and memory leaking that happens when certain hardware doesn’t respons like expected.

Over a longer time, I have invested to optimize the recorder behavior and learn how things work, sort of. In its default state, the recorder was growing with about 1.6Gb per day. Although I have moved to an SSD recently to prevent MicroSD card being destroyed, the situation was still unsatisfying as it started to respond slower over time. On the other hand, I wanted to have a rather long history, enabling me to play with all kind of statistics. I started to execute several actions to make things better, which I have listed below:

Optimze hardware
The MicroSD card basically offers enough space normally to have a lot of history. R/W and IOPS are limited though, so a good speed improvement was achieved by adopting an SSD. This particularly improved the (sometimes rather big) database response. On top, reliability should improve as SD cards are usually not developed for endles high write rates over time, even the A2 class cards have their limitations there.

Configure the recorder to your needs
In fact, I am only interested in a limited set of entities to know the history about, such as mains power consumption, etc. I have taken quite some time to look into the history view and SQlite database viewer to know what happens. My SQL knowledge is limited, but some good basic information could be retrieved rather easy. It resulted in limiting the information that should be recorded. My recorder configuration now looks as follows, noting that I have organized the dbase purge in an automation.

Note that the commit_interval has been set to two minutes which heavily reduces the number of write events in time. Simply said, the recorded entities are stored in normal memory for 120 secs and then stored at once in the database. Fundamentally spoken, the amount of writes remains the same, but it appears to improve life cycle of your storage device, particularly SDcards of course.

recorder:
commit_interval: 120
include:
domains:
- sensor
exclude:
entity_globs:
- device_tracker.*
- sensor.afval_*
- sensor.adguard_*
- sensor.blitzortung_*
- sensor.chromecast_*
- sensor.current_*
- sensor.galaxy_*
- sensor.hp_*
- sensor.huawei_*
- sensor.ipad_*
- sensor.iphone*
- sensor.laptop_*
- sensor.macbook_*
- sensor.nw00*
- sensor.oppo_*
- sensor.power_production*
- sensor.rx
- sensor.tx
- sensor.RX
- sensor.TX
- sensor.battery
- sensor.technicolor_*
- sensor.tc_**
- sensor.tesla_*
- sensor.voltage_*
- sensor.warm_water_*
- sensor.wled_*
- switch.adguard_*
- switch.tesla_*
- switch.wled_*
- switch.zone*
entities:
- sensor.snapshot_backup
- sensor.hacs
- sensor.energy_consumption_total
- sensor.energy_production_tarif_1
- sensor.energy_production_tarif_2
- sensor.last_boot
- sensor.memory_free
- sensor.memory_use_percent
- sensor.long_power_failure_count
- sensor.short_power_failure_count
- sensor.power_tariff
- sensor.verwarming_early_start_2
- sensor.verwarming_link_2
- sensor.verwarming_overlay_2
- sensor.verwarming_power_2
- sensor.verwarming_tado_mode_2
- sensor.verwarming_open_window_2

So basically, I only record a limited set of sensors but keep most of the entities out of the sequence. In fact, I only record about 8% of what HA records by default.

Automate purging
One nice feature of the recording purge is that the “repack” function completely rewrites the database while optimizing and eventually recreating states tables. Logically, this is performed at night when there is no other activity.

Note that purging can also be set in your recorder config, however, the repack function has to be called in the purge service rather then in the normal configuration. Therefore this is handled by an automation and excluded from the config.

The automation for this looks in my case like:

  • id: ‘1234567898’
    alias: ‘[recorder] purge & repack’
    description: ‘’
    trigger:
    • platform: time
      at: 03:03:03
      condition:
      action:
    • service: recorder.purge
      data:
      keep_days: 30
      repack: true
      mode: single ’

I now experiment with the acceptable size of the database to remain fast but still think there shall be much more optimization possible. In its current state, the database grows (from scratch) with about 400kByte per minute and thus grows to:

  • 24Mbyte per hour
  • 576Mbyte per day
  • 4Gbyte per week
  • 17Gbyte over the 30 days purge time - if not compacted by the repack
  • 7 Gbyte when repacked.

Considering the limited amount of entities, this is still a lot so I am open to suggestions what could be done else. Improving the whole database / recorder part of HA would be high on my request list…

I red your topic and found it very interesting…

Do you use MariaDb, cause I do and I don’t know if you “purge a repack” is applicable in this case ?

And of course, do you meanwhile have found other things to do to keep the db healthy ?

And this here is the fundamental problem with Home Assistant as it is built, resulting in a ticking time bomb for every HA installation.

The database needs to be rewritten to a proper relational database having distinct tables to define attributes, entities, numeric states, text states, etc. Then linking tables to combine them together.

This would virtually eliminate redundant information that is currently all stored in the database as horrid JSON strings as VARCHAR.

It is textbook for the most disastrous, inefficient and poor way to design a database.

I applaud your effort in experimentation with controlling this monster, but until a complete redesign of the database is done, it is just pushing deck chairs around on the Titanic. :wink:

An excellent example of a database designed to store events/states is the Zabbix network monitoring software. Have used it nearly a decade at work to monitor thousands of hosts, taking nearly 5K values per second and all that data sits nicely in a 2GB database complete with histories and statistics.

2 Likes