How to keep your recorder database size under control

I see! I use the HA docker image but the sqlite database file is outside docker, its an mounted volume. I havent seen any instability, it is about 100MB. But I wanted to increase recording time from 10 days to like 1 or 2 years. Not sure if I will start running into issues now.

I to run on a decent powerful NUC with 128GB NVME SSD.

If I wanted to run another db for the recorder, I guess it would make sense to run the db in a separate docker image and connect it to the HA image. Seems complicated. :smiley:

Just wondering: if you have intel NUC: why you don’t use HAOS? If. If your nuc supports uefi bios boot then it shouldn’t be a problem: just flash HAOS image on your drive - i used m2 external case and etcher - put it back into nuc and you’re done. All other addons are a fun to add then…
Regarding 1-2 years : it will be a huuuuge file… for example: i’ve had 10 days set on my previous db and db file was just over 2GB. Personally, i wouldn’t reccomend it, but if you DO go with it it’s another reason to switch to MariaDB.
Note that in any case you already have long term statistics for all major entities anyway. Check history of your, say, temoerature for a few months or 1 year back. It should appear.

I copy the database to my Mac and use DB browser for SQLite. Would that work for you?

Yep, that is what I did.

Still my DB hasn’t grown much.

I been running with 720 days for a month now and it only grew 2 MB.

Well I run other stuff such as pi hole, cloudflared etc. Containers are much easier to maintain imo.

I found out that many of my custom sensors caused an extreme amount of db entries. The reason is that any entity value change triggers the calculation of a custom sensor.

And as I was using three different entities to calculate a sum of them, it caused every 5 to 20 seconds a new db entry. Finally I solved this issue by adding a time_pattern trigger. By that the calculation isn’t triggered by the involved entities, instead its triggered by a fixed time interval. In my example every 5 minutes:

  - trigger:
      - platform: time_pattern
        minutes: "/5"
    sensor:
    - name: WP Energy Sum
      unique_id: shelly3wp_energy
      unit_of_measurement: 'kWh'
      state: >-
        {#- on reboot some sensors do not return a number (instead they return "unknown" or "unavailable") -#}
        {%- if is_number(states("sensor.shelly3wp_channel_a_energy")) and is_number(states("sensor.shelly3wp_channel_b_energy")) and is_number(states("sensor.shelly3wp_channel_c_energy")) -%}
          {{ (states("sensor.shelly3wp_channel_a_energy") | float + states("sensor.shelly3wp_channel_b_energy") | float + states("sensor.shelly3wp_channel_c_energy") | float) | round(2) }}
        {%- else -%}
          unavailable
        {%- endif -%}
      device_class: energy
      state_class: total_increasing

The graph for example confirms the much rarer calculation:

Before:

After:
image

I never tested it, but it sounds like a possible friendly DoS attack if you use too many multiple hardware sensors in a custom sensor :sweat_smile:

5 Likes

Good point. My approach has been to exclude any new entities I create, unless it’s something I really need to keep. Thus any interim entities used just for a template, calculation or whatever, would be put on the “exclude” list immediately.

I like the analogy of a self-inflicted DOS attack. I think that that way Recorder is configured by default (keep everything) is setting new users up for exactly this kind of failure.

Why did you write a duplicate guide?

Just edit existing guides in future. Anyone can.

1 Like

Hmm, The query doesn’t seem work SQLlite web 3.10.0

I’ve tried using these queries with SQLite 3.10.0 and I get a 400 Bad request error when executing. is anyone else seeing the same?

EDIT: Just found this mentioned here: Error "400: Bad Request" on all queries · Issue #281 · hassio-addons/addon-sqlite-web · GitHub

2 Likes

Same. SQLite Web doesn’t seem to work.

I made a Jupyter notebook which can be used to query the system from a backup of Home Assistant. Maybe somebody find it helpful.

In the Readme you can find out more about the process. If you store you backup on Google Drive it should be just 2 clicks via Google Colab.

Thanks @denilsonsa for the great work.

2 Likes

same here, but putting every part of the call into a single line seems to work for me.

Great work. Did you manipulated the backup file name or format in any way cause it looks like the script expects the naming to be different than standard backup naming, eg: Full Backup 2023-12-05 09:00:20.tar

I keep getting:


Logger: homeassistant.components.http.security_filter
Source: components/http/security_filter.py:66
Integration: HTTP (documentation, issues)
First occurred: 15:19:15 (8 occurrences)
Last logged: 16:25:57

Filtered a request with unsafe byte query string: /api/hassio_ingress/4na91HttrXMhiO4gszMnRJBZUmR7gruQGT1Dl1UWCws/states/query/?ordering=&export_ordering=&sql=SELECT%0D%0A++COUNT%28*%29+AS+cnt%2C%0D%0A++COUNT%28*%29+*+100+%2F+%28SELECT+COUNT%28*%29+FROM+states%29+AS+cnt_pct%2C%0D%0A++states_meta.entity_id%0D%0AFROM+states%0D%0AINNER+JOIN+states_meta+ON+states.metadata_id%3Dstates_meta.metadata_id%0D%0AGROUP+BY+states_meta.entity_id%0D%0AORDER+BY+cnt+ASC
Filtered a request with unsafe byte query string: /api/hassio_ingress/4na91HttrXMhiO4gszMnRJBZUmR7gruQGT1Dl1UWCws/states/query/?ordering=&export_ordering=&sql=SELECT+%0D%0A++++entity_id%2C%0D%0A++++COUNT%28*%29+AS+cnt%0D%0AFROM+states%0D%0AGROUP+BY%0D%0A++++entity_id%0D%0AORDER+BY%0D%0A++++COUNT%28*%29+DESC%3B
Filtered a request with unsafe byte query string: /api/hassio_ingress/4na91HttrXMhiO4gszMnRJBZUmR7gruQGT1Dl1UWCws/states/query/?ordering=&export_ordering=&sql=SELECT%0D%0A++states_meta.entity_id%2C%0D%0A++count%28*%29+cnt%0D%0AFROM%0D%0A++states%0D%0A++LEFT+JOIN+states_meta+ON+%28%0D%0A++++states.metadata_id+%3D+states_meta.metadata_id%0D%0A++%29%0D%0AGROUP+BY%0D%0A++states_meta.entity_id%0D%0AORDER+BY%0D%0A++cnt+DESC%3B
Filtered a request with unsafe byte query string: /api/hassio_ingress/4na91HttrXMhiO4gszMnRJBZUmR7gruQGT1Dl1UWCws/states/query/?ordering=&export_ordering=&sql=SELECT%0D%0A++statistics_meta.statistic_id%2C%0D%0A++count%28*%29+cnt%0D%0AFROM%0D%0A++statistics%0D%0A++LEFT+JOIN+statistics_meta+ON+%28%0D%0A++++statistics.metadata_id+%3D+statistics_meta.id%0D%0A++%29%0D%0AGROUP+BY%0D%0A++statistics_meta.statistic_id%0D%0AORDER+BY%0D%0A++cnt+DESC%3B
Filtered a request with unsafe byte query string: /api/hassio_ingress/4na91HttrXMhiO4gszMnRJBZUmR7gruQGT1Dl1UWCws/events/query/?ordering=&export_ordering=&sql=SELECT%0D%0A++COUNT%28*%29+as+cnt%2C%0D%0A++COUNT%28*%29+*+100+%2F+%28SELECT+COUNT%28*%29+FROM+events%29+AS+cnt_pct%2C%0D%0A++event_types.event_type%0D%0AFROM+events%0D%0AINNER+JOIN+event_types+ON+events.event_type_id+%3D+event_types.event_type_id%0D%0AGROUP+BY+event_types.event_type%0D%0AORDER+BY+cnt+ASC

The secret: “It seems that the queries have to be all on one line, then they work”

1 Like

Can you rewrite them and share them, so us that not strong in SQL also can use them :slight_smile:

1 Like

I used these settings for about 5 months:

recorder:
  #How many seconds to wait before writing data to db, default is 1 second.
  commit_interval: 32
  #How many days to keep the data before purging it, default 10 days.
  purge_keep_days: 720
  auto_purge: true
  auto_repack: true

And my DB grew to about 770MB.

Now I wanted to reduce it to 60 days:

recorder:
  #How many seconds to wait before writing data to db, default is 1 second.
  commit_interval: 32
  #How many days to keep the data before purging it, default 10 days.
  purge_keep_days: 60
  auto_purge: true
  auto_repack: true

However, after running this a couple of days, my db size is not decreasing. Why is it not purged?

The purge is not run daily. I read somewhere (probably this thread) that it’s only once a month, or is it once a week?

Anyway, you can do the purge from within HA. Developer Tools / Services / Recorder: Purge.

NOTE: Be sure to select the check box for Repack and turn on the slider switch that’s hidden way over on the right. I have no idea why you have to select this option twice, but that’s how it is.

Give it a few minutes to work and you should see your db size diminish. BTW, 770M is low, given your 720 days purge setting. With the default of 10 days mine would jump up to 3G in no time. I had to ruthlessly exclude everything I could and shorten the purge setting to make it manageable. Of course there have been some improvements since then, so maybe what you’re seeing is more typical now.

1 Like

Purge is ran every night. Purge does not reduce the size, it removes the data. The allocated memory stays the same.

Repack (which is what reduces your database size) runs every second sunday of the month. Also, repack is not instant, it takes hours or days in some cases.

4 Likes