How to keep your recorder database size under control

petro · January 22, 2024, 1:12pm

That’s why you shouldn’t be doing all this. Just let HA manage the database. All the changes that were made to the database in the past year were to increase the longevity of SD cards by minimizing reads and writes. When you add all this stuff on top of it, you’re just adding to the reads/writes. Just use exclude/include and keep the default processes.

Ricks88 · January 22, 2024, 3:50pm

Thank you all for the information, so the best and safest to do is just exclude sensors where i do not need the data from. HA will purge itself around 10 days if i am right?
And than that ‘old’ data wil be removed and the excluded sensors will not fill the database with new ‘useless’ date.

is that sort of right? haha

NathanCu · January 22, 2024, 4:51pm

Correct. You’ll be left with the historical data on the sensors not excluded - which is a downsample of the original data. (again to reduce the long term data footprint)

Also everyone is absolutely right on SD burnout. I’d be way more concerned about that than how much data you have, to the point of migrating to a different storage medium. (SSD)

Ricks88 · January 22, 2024, 5:40pm

I use HA Yellow, with system on the EMMC and data on SSD, the database is on the SSD now.
I do notice that the booting/full restarting time is slower than with the MariaDB of my previous HA installation.

illuzn · January 23, 2024, 12:13am

But what happens if you do it between Long Term Stat updates?
Presumably, you will have missing or incorrect stats for that particular hour (since according to the docs it updates hourly)

Or worse, if you coincidentally do it while a Long Term Stat update is occurring… who knows

Yes, both of those were ticked and activated. I’m fairly certain there’s a bug with the service but I have no way to confirm it. Even without repack enabled it should remove the entities in question (without reducing DB size) - however using a database explorer I could still see the entities there.

I’m running on real metal on an SSD so this isn’t an issue. What is an issue is having a 2.0GB database because my solar sensor updates every second with power use but I want to keep solar data because (among other things) Energy Dashboard.

Yes, my workaround is only for sensors which are poorly programmed that poll for data far too often (but which you want to keep the data for).

It would be great if we could selectively say, for this entity (or even this integration), I only want data for 1 day but that just isn’t possible.

Hold the phone on that, I luckily just transitioned to bare metal with an NVME SSD. After running a similar setup (RPi with database on a Samsung USB SSD) my SSDs wear levelling count is sitting at 19 (effectively 80% lifespan used up).

If I hadn’t moved setups, another few months or so and it’s highly likely my SSD would have died.

I think so few people are running on SSD that this isn’t brought up as an issue (also given it’s takes about 2.5 years to get here compared to weeks or months for an SD Card). Coincidentally, Samsung’s warranty on portable SSDs are 3 years (so pretty much within the range).

I think ideally:

All of your database should fit in RAM.
Use a db engine that can rely on RAM cache rather than disk (MariaDB among others).
Storage in this order:
- Enterprise SSDs are the go to (noting the limitation that you can’t really do this on RPi or HA Yellow).
- Consumer grade SSDs are next. These have much lower endurance than enterprised SSDs (but are also cheaper). Aim for one with the highest write endurance (usually measured in TBW or TeraBytes Written) or MTBF (mean time between failure).
- Portable SSD/ Hard Disk Drive: Most portable SSDs have lower warranty periods and don’t state a TBW or MTBF, so I’m torn between these 2 choices. It is possible for a portable SSD to have the same endurance as a consumer grade SSD although I haven’t seen one on the market myself.
- High endurance EMMC/ SD Card - will die pretty quickly unless you optimize your disk writes.
- EMMC/ SD Card - will die pretty quickly.

Of course, SSD failure mode is unlike any other disk. You will still be able to read data (just not write it). So it’s not unrecoverable unlike other modes of failure.

In any event, I’d be monitoring your SMART statistics using HDParm and keeping an eye on the important statistics (differs depending on drive type and doesn’t work for EMMC/SD).

Edit: I just pulled the data for my old Samsung USB SSD.

Smart ID: 241
  Name: Total LBAs Written
  Value: 16546952261

So with an LBA block size of 512 bytes, thats:

16546952261x512/1000/1000/100/1000 = 84.72 TBW

Probably on the shorter end of the lifecycle of an SSD but it is a portable SSD with (presumably) less endurance so not out of the question.

CaptTom · January 23, 2024, 1:18pm

Funny you should mention that:

Ricks88 · January 23, 2024, 7:39pm

I excluded the top states/entities as suggested and nothing seems to happen, did a purge etc etc.
I was about to give up (so be it…) stopt paying attention to it and a bit later, boom, around 50% smaller database!

Seems like the purge takes time, and also purge some entities that not seems to disappear from the database.

My biggest states are now the energy sensors, i have no idea if i can exclude those sensors without destroying the energy data…

Ricks88 · January 23, 2024, 8:10pm

Also, the ‘new’ 'system_monitor is a big database filler. I excluded most of the enitites from it.

My config so far:

recorder:
  db_url: sqlite:////config/home-assistant_v2.db
  exclude:
    domains:
      - device_tracker
      - media_player
      - uptime
      - time_date
      - worldclock
    entity_globs:
      - sensor.clock*
      - sensor.date*
      - sensor.time*
      - sensor.uptime*
      - weather.forecast_*
    entities:
      - sensor.home_assistant_v2_db_laatst_bijgewerkt
      - sensor.system_monitor_disk_free
      - sensor.system_monitor_disk_use
      - sensor.system_monitor_last_boot
      - sensor.system_monitor_load_1m
      - sensor.system_monitor_load_5m
      - sensor.system_monitor_load_15m
      - sensor.system_monitor_memory_free
      - sensor.system_monitor_memory_use
      - sensor.system_monitor_network_in_end0
      - sensor.system_monitor_network_out_end0
      - sensor.system_monitor_network_throughput_in_end0
      - sensor.system_monitor_network_throughput_out_end0
      - sensor.system_monitor_packets_in_end0
      - sensor.system_monitor_packets_out_end0
      - sensor.system_monitor_swap_free
      - sensor.system_monitor_swap_use

Is uptime time_date time* date*
Sensors safe to exclude with automations and all?

ddaniel · January 23, 2024, 8:18pm

This is my settings so far

recorder:
  db_url: sqlite:////config/home-assistant_v2.db
  exclude:
    domains:
      - media_player
      - update
      - sun
      - timer
      - light
    entity_globs:
      - timer.*
      - select.miwifi*
      - sensor.miwifi*
      - number.*_radiator*
  purge_keep_days: 8
  commit_interval: 0

Database is 2.7 GB

illuzn · January 24, 2024, 1:14am

Either of memory_free or memory_use can be useful to diagnose crashes from memory leaks. But you don’t need both.

Any sensor is safe to exclude, it just means you won’t have history/ logs. In fact you can turn off recorder all together and HA will still work.

See my post above for how I deal with these

Caveats: Will increase disk hits so only do this if your DB is big (I’d say bigger than 1GB for 30 day retention). As I said above even with a portable Ssd this will probably kill your drive in about 3 years.

You do you but this is massive and my DB was this size and killed a Usb Ssd in around 3 years.

For comparison, I have around 1000 (active) entities and keep days of 30 with a DB size of 650MB.

reg · January 24, 2024, 4:58pm

Hello all, thank you very much for the different explanations and aspects, which are described very well here. I’ve been reading the posts for some time and I’m very pleased that the topic is being discussed more intensively again. My HA installation has been active for several years and the database is now larger than 50 GB. I run HA on an ODROID-N2/N2+ with an external SSD. Is there really a need to do something in the database management or why don’t you just let HA run and manage itself? Because of the SSD, access times are not an issue and storage space should no longer be a factor. Is there a maximum size for the database defined, are there limits? Thank you in advance for your comments. Greetings

diamant-x · January 24, 2024, 7:00pm

I would say if the interaction with HA is snappy enough for you then its fine (and up to you how to use storage, maybe you are also a #DataHoarder like many here ).
You can evaluate that, by removing the recorder integration from the config file, then restart HA, and check the experience (navigating around, executing scripts, etc…). Then add it back in the config file to go back to use it. Of course, that time you don’t have it will mean all data generated during that time will be lost.

Ricks88 · January 24, 2024, 9:23pm

Indeed, i used just one of those kind of sensors. Rest is excluded.

Did not know, i thought it would not be able to calculate energy data etc. Or trigger automations when using ‘is off for x minutes’…

Do you mean the one from the Blueprint / automation?

Thanks!

illuzn · January 25, 2024, 1:23am

Install hdparm
Read the NVME/ SMART statistics for your SSD.
I would be very surprised if your SSD is not about to die. Look for something like wear levelling remaining (effectively lifetime remaining) or reserve blocks used (internal SSD reserve blocks that are used to replace blocks that have died).

Sorry, what I meant is you might lose functionality but the core of HA (being able to control your home will still work). For sure, Energy Dashboard will not work without recorder because that relies on long term statistics data (which in turn relies on recorder).

I mean this one:

Ricks88 · January 30, 2024, 11:43pm

is the ‘logbook’ with a lot of useless info, also kept in the recorder/database file?

Ricks88 · February 3, 2024, 1:31pm

After lots of trying i came up with the following settings:

#
# https://www.home-assistant.io/integrations/recorder/
#
recorder:
  purge_keep_days: 3
  auto_purge: false
  db_url: sqlite:////config/home-assistant_v2.db
  exclude:
    domains:
      - device_tracker
      - media_player
      - uptime
      - time_date
      - worldclock
    entity_globs:
      - sensor.clock*
      - sensor.date*
      - sensor.time*
      - sensor.uptime*
      - weather.forecast*
      - sensor.afvalinfo*
    entities:
      - sensor.electricity_meter_tijdstip
      - sensor.home_assistant_v2_db_laatst_bijgewerkt
      - sensor.system_monitor_disk_free
      - sensor.system_monitor_disk_use
      - sensor.system_monitor_last_boot
      - sensor.system_monitor_load_1m
      - sensor.system_monitor_load_5m
      - sensor.system_monitor_load_15m
      - sensor.system_monitor_memory_free
      - sensor.system_monitor_memory_use
      - sensor.system_monitor_network_in_end0
      - sensor.system_monitor_network_out_end0
      - sensor.system_monitor_network_throughput_in_end0
      - sensor.system_monitor_network_throughput_out_end0
      - sensor.system_monitor_packets_in_end0
      - sensor.system_monitor_packets_out_end0
      - sensor.system_monitor_swap_free
      - sensor.system_monitor_swap_use
      - sensor.aeotec_z_stick_gen7_average_background_rssi_channel_1
      - sensor.aeotec_z_stick_gen7_average_background_rssi_channel_2
      - sensor.aeotec_z_stick_gen7_average_background_rssi_channel_3

I tried a lot of things, the first wast excluding things from the recorder, my DB decreased a lot in size, but the problem was that it still keeps growing again, even after excluding the size added up and up to the same size of my starting point again.

The biggest states where from the system monitor sensors and energy sensors.
As suggested by @illuzn i started to use the automation to keep energy data just for one day. This seems to bring no problems in the graphs and lowers the DB size a lot.

Also, why do we need the default 10 days of data? I put it back to 3 days and this lowered the size a lot again.

I added 2 more automations i found in the old home assistant configucation from Frenck:

id: "system_recorder_repack"
alias: "System - Recorder Repack"
description: >-
  Sunday morning, early morning, repack the recorder database.
  This keeps the disk usage low and nice by reclaiming space.

trigger:
  - platform: time
    at: "05:55"

condition:
  - condition: time
    weekday:
      - sun

action:
  - service: recorder.purge
    data:
      repack: true
      apply_filter: true

mode: single

id: "system_recorder_purge"
alias: "System - Recorder Purge"
description: >-
  Purge my recorder database every hour, to keep things nice and tight.
  While Home Assistant does this daily, I like to do it every hour.

trigger:
  - platform: time_pattern
    minutes: 12

action:
  - service: recorder.purge
    data:
      repack: false
      apply_filter: false

mode: single

For now my DB keeps around the 65MB in size, will have to see what it does after a week from now.

PS. Sorry for the bad grammer sometimes.

If anyone has suggestions or improvements, please let me know!
Thanks for the help and info on this page

Edit: Note the historical graph from the database sensor, some of my energy sensors do have this aswell, but also a lot do not have any historical data in it, just the last days. Any idea why some sensors have historical data ans some not?

illuzn · February 6, 2024, 2:03pm

It just depends on how often you think you will need to the second data. It’s kind of handy for bug checking and the like to know what is causing potential issues. Personally, I keep 1 month because I’m not watching HA every day and if I spot an issue 2 weeks after it occurs it’s good to know what’s going on.

Be aware, this causes a lot of unnecessary disk writes and may wear out your ssd/ hdd/ sdcard even quicker. I do this once per day, and realistically, a 200mb database is not a big deal (mine is around 600mb with 30 days data).

Sensors must:

be recorded by the recorder
provide a valid “state_class”

in order for statistics to be provided. I suspect some of your sensors are not properly providing a state_class. You can use a “customize” setting in your configuration.yaml to fix this.

Ricks88 · February 6, 2024, 3:11pm

Thats true, however if i not check HA i do not update anything of change anything so I hope there are no new bugs happening.
I am still figuring out what time to keep the data…

Thanks, I thought that only repack would causes the write wear of the disk.
than i better set it back to default, thats one time a day.
Also maybe I should start accepting some 100+ MB for a database
Thought it would be slow with a big database and backups do get large.

For energy sensors I asked here:
How to make a sensor keep LTS and last period LTS - Configuration - Home Assistant Community (home-assistant.io)

template:
  - sensor:
      - name: "Daily Energy"
        unit_of_measurement: "kWh"
        device_class: energy
        state_class: total_increasing
        state: "{{ states('sensor.daily_energy_offpeak')|float + states('sensor.daily_energy_peak')|float }}"
        availability: "{{ has_value('sensor.daily_energy_offpeak') and has_value('sensor.daily_energy_peak') }}"

This is working

Ricks88 · February 10, 2024, 11:50am

I did change your automation to purge entitieglobs, now it’s not doing anything.

id: "system_recorder_purge_entities"
alias: "System - Recorder Purge Entities"
description: >-
  Cleans up sensors that log too much but want data of.

trigger:
  - platform: time
    at: "04:44"

action:
  - service: recorder.purge_entities
    data:
      domains: []
      entity_globs: >-
        sensor.daily* sensor.yesterday* sensor.week* sensor.weekly*
        sensor.last_week* sensor.month* sensor.monthly* sensor.last_month*
        sensor.year* sensor.yearly* sensor.last_year* sensor.electricity*
        sensor.energy* sensor.system_monitor* sensor.last_week*
      keep_days: 1

mode: single

Do i really need to add all the entities manual as a sensor?

Edit: it does not purge any of the entity_globs, only wat is in

    target:
      entity_id:
        - sensor.daily_energy

Ildar_Gabdullin · February 13, 2024, 5:52pm

Opened a DB file in DB Browser for SQLite.
Is it normal that the “states” table contains “NULL” in “entity_id” column?