How to keep your recorder database size under control

@petro Hey, Thanks for the categorisation. Did you also mean to say that you don’t use the recorder function, including inclusion and exclusion, because you consider it to be of secondary importance? I’d be delighted to hear your opinion, as I was about to throw myself into this work, but if the system already handles it itself and it’s not proportionate, I might leave it alone.

I had read through this blog post once:

I use recorder to only include/exclude entities. Nothing more. I typically only include to keep my list manageable. I do not care about 90% of the entities on my system in regards to history. at this point, I only store about 200 of my 2k entities.

2 Likes

Seems like the automatic cleansing never ran, still shows 700MB. So I ran it manually now and it immediatly went down. :slight_smile:

Second Sunday of the month is tomorrow, which is when it would have reduced in size.

How do you define this list? in configuration.yaml directly or you have a separate file and some “include” thing? I mean… all those entries in config would be pretty … well, hard to read, not very nice…

I just list all the excluded entries alphabetically, right in configuration.yaml. I haven’t found it unmanageable. By default, the Recorder section was right at the top of the file, which is handy.

When I add a new entity, I pop it in the exclude list unless it’s something I need to keep. In reality it seems I generally add multiple entities at once. Often a new device has a number of entities, and I only care about one or two. Or I’m adding multiple devices. Or I’m making a template and can exclude the underlying entity but keep the template one.

Personally I’d like to have the option to choose, include or exclude, when an entity is added, just like we see with area. It takes some discipline to remember to do this every time. But I’ve been burned often enough that it’s now second nature.

I just tried “split configuration” for recorder
I entered this into configuration:

recorder: !include recorder.yaml

then i created file “recorder.yaml” and in that one i entered:

purge_keep_days: 15
exclude:
  entity_globs:
    - device_tracker
    - update
    - light.browser_*
    - media_player.browser_*
    - sensor.browser_*
    - input_boolean.wakeweekday_*
    - sensor.sun*
    - weather.*
  entities:
    - sensor.watchman_missing_entities
    - device_tracker.huawei_p9
    - device_tracker.sm_g950f
    - device_tracker.samsung_s8

But, as it seems, it doesn’t work, since my previously excluded entities were saved in history after HA restart, which points htat HA obviously took default recorder settings as active…

Is there any reason why such split config wouldn’t work?

EDIT: hm… after history check it seems that above setup doesn’t work at all… it seems that i messed up big time… i entered two domains under “entity_globs”, for one… but can this be the reason for NOT working?

1 Like

@denilsonsa I suggest the following changes to bring it into line with the current practice and Home Assistant long term statistics - disabling the recorder for entities also disables long term statistics for them. Since your guide is the most popular guide on doing this, it makes sense to update this rather than create a new post.

Additions in underline.

Introduction

This is yet another guide on:

  • How to view the size of your database.
  • How to figure out which entities are using too much space.
    • And how to filter them out.
  • How to deal with entities which you want to keep long term statistics for but report too often

Motivation
…[add at the end of this section]
Finally, long term statistics relies on the recorder. Entities which are not recorded also will not have long term statistics gathered for them - this is not explicitly stated in the documentation but is implied by “Statistics are … summarized every hour” (presumably, from recorder data). This is often a problem for energy sensors/ solar inverters which often report data every second (generating around 85,000 entries a day) but for which you probably want to keep statistics for the Energy panel.

Filtering out entities
…[add at the end of this section]
Remember, filter entities which you do not want to keep data for at all (both history and statistics). For example, the energy dashboard relies upon statistics so be mindful of the consequences of your actions.

More aggressive configuration/ Keeping Statistics but Purging Recorder
…[add at the end of this section]
For entities which you want to keep statistics but:
* do not need history data
* report far too often taking up valuable storage
You can set up an automation to purge those entities daily. Put the following in a new automation (feel free to change the name/ description/ trigger time to something sensible):

automation:
  alias: Cleanup Database
  description: >-
    Cleans up sensors that log too much but want data of, then optimizes
    the database. Use recorder excludes for sensors you do not care about.
  trigger:
    - platform: time
      at: "03:22:00"
  condition: []
  action:
    - alias: Purge Spacehogging Entities
      service: recorder.purge_entities
      target:
        entity_id:
           - sensor.spacehogging_entity_1
           - sensor.spacehogging_entity_2
      data:
        keep_days: 1
  # This next step isn't strictly necessary. Delete if you prefer to miminise disk hits.
    - alias: Repack DB
      service: recorder.purge
      data:
        repack: true
        apply_filter_true

or Import blueprint to Home Assistant
Note: It is untested what effect occurs to long term statistics if you change keep_days: 0 - this might cause a gap in your statistics data if long term statistics has not logged that data yet. If you have a lot of these spacehogging entities you might want to change keep_days:0 and use a time_pattern trigger instead:

  trigger:
    - platform: time_pattern
      # Runs every 6 hours
      hour: /6

or Import blueprint to Home Assistant

Resources
…[add at the end of this section]
[1] Long Term Statistics Docs: Home Assistant Statistics

3 Likes

Nice write-up, thanks!

It does, however, point out one flaw in the design of HA’s database. The long-term and short-term statistics are kept in the same database, and there is only one keep_days value for all entities.

Depending on your hardware, it’s not only file size but number of writes which may be a concern (e.g.; using an SD card.) Allowing “chatty” devices to spam the database with high volumes of unwanted data, then run a job later to purge them, might reduce the life of an SD card.

My own solution is simply not to use HA for long-term data. The things I want to track over time are written to .csv files which I can analyze any way I want on my laptop, which is much better suited for this task anyway. I have automations which save those data on a schedule which makes sense for each entity being tracked.

Hello,

I need some help understanding my output form the query.
Screenshot states:

Does this mean the sensor.aeotec_z_stick_gen_7.. uses the most space in the database file?
And that I can exclude these sensors in the “recorder” yaml? Or disable the entities?
Also note (exactly 1000 results) is this the max?

For events:

It is service_registered than shows the most.
recorder: exclude: domains:

Last question, what exactly means cnt and cnt_pct?

Thanks

Yes, yes, yes, and probably.

The first one on the list recorded the most records.
You can exclude or disable the entities to prevent them from being entered in the DB.
I assume 1000 is just the max your query will return. Your screen shots are very hard to read.

Again without being able to read the SQL, I’d assume “cnt” is the count (number of) records, while “pct” to me would stand for “percent,” meaning the percentage of the total records this entity represents.

So, just go through all the worst offenders near the top and exclude the ones you can live without in your configuration.yaml Recorder section. Or if you discover it’s an entity you don’t even need to look at in real time, disable them or delete the device creating them.

1 Like

Thank you :slight_smile:

I see my screenshot is very bad on an other PC screen, screenshot was taken on a curves monitor… seems a screenshot on a curved monitor is un readable.

New one states:

Events:

I think it can not be percent, because it does not make 100% in total.

One question that was also ‘unreadable’ haha:
In Events It is service_registered than shows the most, how do i exclude this one? or must i not?

Can i delete the event types from the recorder with an automation?

I never really looked at the events table. When I excluded all the entities the states query identified, I found the database much smaller and I haven’t had a problem since. I recommend you start there and see how it goes.

One of the things which might be useful would be to use the Recorder: Purge entities service in Developer Tools. This might give you some feedback on how excluding certain things will impact your DB size.

1 Like

My cnt (count of entries) and cnt_pct (percent) adds up correctly to the total records in events and 100% respectively. But I’m using MariaDB.

This event is fired when a new service is registered.

Removing things from the recorder tables shouldn’t break anything directly. It may however, affect your logbook/ history (and thus any graphs/ automations relying on historical data).

That said, those are some rookie numbers. My service_registered is around 200,000 (and that’s not even my heaviest event). I wouldn’t bother touching this unless your db is above 200MB.

Events can only be excluded via yaml. Also, note recorder.purge with the setting apply_filter does not appear to do that function (at least for me).

Events tend to be about 20% of the total entries in the database. You’re much better off looking at the states table which is by and large the largest table in the db. From the looks of it you don’t have a single large entity but multiple small entities which are using your space.

The third option (besides include/ exclude) for entities is to purge the entity daily e.g. with my blueprint automation so that short and long term statistics are still kept. This is what I’m doing for my network traffic for example. I probably don’t need to know what my network traffic is doing to the second but it would be useful to know what it did in this 5 minute window.

Also, the count can be deceiving. States can be heavy if you have an entity which has a lot of attributes. Technically speaking, attributes have almost unlimited space so you could have an entity pull the entirety of Wikipedia and publish it as it’s attribute for example (which would destroy your db).

I tried excluding (gives an ‘error’ in the statistics that it can not keep data anymore of those entities) the top entities and also purge those entities from the recorder. I checked that there was no data anymore, but my DB size went from 205.000 to 186.000. not much improved. Also just excluding did not change anything on the size.
Today the size is back at 216 MB again.
I have to look into it after work again but i am a bit clueless on what to start removing from the recorder.
Removing all those z-stick and system monitor entities did almost nothing.

Can i purge the energy data without losing my power consumption history?

So you think the service_registered is not the big problem of my size? my DB is already 200+ after 7 days of home assistant now.
Excluding them also did not made changes, numbers staid the same.

I tried removing the top once but that did not change much, from around 205 to 186mb , also excluded a lot but did not change a thing. (also not after purge).
So i will have to look into it after work again today.

Think i wil also going tot do this, i do not need al the long term data of system monitors and much more things. But excluding them completely gives me no data at all, and gives an ‘error’ in the statistics.

How does this work? i have read in some of the last HA updates that the database changes to keep long term data without filling the database and that self days to keep better could be removed.

this one: 2023.12: Welcome home! - Home Assistant (home-assistant.io)

I am trying to understand it all haha

Thanks so far!

Don’t quote me since I didn’t look directly at the code. But according to the data science portal on this topic, Home Assistant will summarise data every hour to generate statistics data.
Ideally, I’d purge the data every hour but there’s no option to do this (only daily) and it will probably cause unnecessary disk hits compared to any savings you would achieve.
If you use purge_entities with keep days 0, I don’t know what will happen to your statistics data (especially if you do it while its performing an update).

That is quite good considering you have around 1000 entities. I’d be looking into which of your entities have large attributes. Typically weather sensors are notorious for this (I have one which basically dumps the entire weather data into the attributes), while this is probably only 10kb of text, imagine that updating multiple times a day and that’s how you get a big database.

1 Like

Nothing, purge only cleans your history & events table. Long term stats are forever.

Just to be clear, excluding entities requires (I think) a re-start of HA to take effect. Also, if you use the Recorder: Purge service in the UI, there are two additional steps (one check box, and way over on the right, a slider switch) to make the repack happen. It’s easy to miss one or both of those and wonder why the purge appeared to do nothing.

One more caveat about using a regular purge instead of exclude: If you’re using an SD card, this can impact the longevity of the card. Apparently they support a limited number of writes, and continually writing a lot of records, only to delete them later, can’t be a good thing.

2 Likes