How to keep your recorder database size under control

Yeah, I’m having trouble following the discussion, too. It would be good to clarify when we’re talking about templates, ESPHome entities, or entities added by various integrations. Part of why I’ve been ignoring state_class as an option for limiting DB bloat is that it’s not really clear how it’s established, what is impacted by changing it, and where to change it. As I said, for me it’s easier to write a one-line SQL hammer and run it every so often.

A fault is too strong a word in my view, but the burden is on you to provide accurate information in order to get accurate answers. Being upset about this will not lead to more or better help.

Sure, that’s just true for you, but usually when people speak about the state class, it will be on the HA side (as the common denominator), because there’s a variety of devices and integrations people can use. With non-ESPHome devices, people have less control over this, which is why it then gets overridden on the HA side.

Inaccurate. ESPHome is a project in its own right and can be used independent of HA. Both are merely under the Open Home Foundation.

Inaccurate. Integrations would typically set it, and only for select entities. In the case of ESPHome it is not the integration. I’ve never seen this done automatically for any of my ESPHome devices.

Back to my question though:

If you set it before (i.e. even setting it to '') you now have a state class (from HA’s point of view), even if you remove it from config now. That said, your repairs dashboard should pick up this discrepancy (you had it before, and you have none now). Do you not see a repair?

  • ESPHome may be standalone, and I use that often, but the storing of data is done by HA.

  • You obviously know my system and code but I repeat that I had never set any state_class for my sensors. All data is ending up in LTS because a state_class was being assigned. I only went down this path because I wanted to stop saving useless data for all time.

Code for a temp sensor;

  - platform: dallas_temp
    address: 0x4000080104f4eb10
    name: "Outside Temperature"
    accuracy_decimals: 1
    update_interval: 60s

From Developer Tools\States

To prevent state_class being assigned;

  - platform: internal_temperature
    name: "Internal Temperature"
    state_class: ''

Once this is done I then get an option to remove data from Developer Tools\Statistics.

It does seem that the platform you are choosing in esphome is perhaps dictating the state class in home assistant.

I do not know esphome well at all, but I do know that you can use customize on the resulting home assistant entities in order to remove the state class.

I tried to tone down your previous message, but you keep trying to provoke for some reason. I never said you set it for a fact. I merely stated that if you did, then there is a certain way out of that situation. It’s because I have done this before where I did passing a state class, but then later changed my mind.

This is new information. Previously, it was the WiFi signal sensor. I don’t see that for either platform a state class is set by default. All I can say is that the core (base) sensor config won’t do that. As the docs state, the state class is optional. Perhaps you found a bug in these platforms or something that is undocumented. You should search the ESPHome issues on GitHub or log a new issue. I cannot say whether this behaviour is intentional.

As I said, you’d get a repair message.

It appears that any sensor that returns a numerical value is assigned a state_class. I do have an Android device which shows this. I do use Customize to correct state_class for some energy sensors.

1 Like

I guess I need to do some homework. It’s not clear to me what functionality I lose if I null out state_class on all my entities. It’s not clear to me where this setting is established across all the different platforms and integrations, or how it’s used for anything besides LTS.

A targeted search turned up lots of very specific errors, problems and complaints, but not much general information about what state_class really is and how it works.

Is there a quick reference somewhere?

1 Like

Hey friends,
seems like I found the right thread. Any advice how I can purge all history from an sqlite database? My disk is 99% full and recorder.purge seems to not like that.

I am running a HAOS setup and can’t easily stop home assistant and manipulate the database file on disk (would need to connect the eMMC card to a different PC and I’m not physically there).
Any ideas?

Got this resolved by recklessly deleting the database file while HA was running. The details are in the linked ticket for those interested :slight_smile:

Actually, it’s not reckless (unless you didn’t make a backup in case you realised you needed something), as your HA config isn’t stored in the DB. Only your sensor and other data. Since the DB usually only stores recent data (the default recorder setting is 10 days), people with corrupt DBs will often delete it. You’d lose your LTS too though, so that’s something to remember. The yanking you referred to in the linked topic on a running instance: Well, now that might be frowned upon. :smiley:

1 Like

That statement is true and false at the same time. Well, after a repack followed by a purge I notice significant lower system load for some time (until the DB starts to grow again). So buying that faster system response during the day (when HA is needed) for a precisely planned load situation during the night (when HA is not needed that much) is a rather great deal for me.

2 Likes

I can’t find an entry in the recorder manual that the vaccum is only done monthly.
can you please show me the entry? Thank you!

Thanks to some help from this thread, I was able to prevent my entities from recording LTS data, and clearing these data from the database. It made a huge difference. My database is 35% of the size it was before I started this effort.

I posted more info on another thread, here.

see Recorder - Home Assistant under

auto_repack boolean (Optional, default: true)
Automatically repack the database every second sunday after the auto purge.

Hoping someone can help the village idiot.

My HA DB had grown to 2.5gb and I realised this was something that I needed to look at. I do need 14 days of data for certain things to help my home battery storage management software do it’s thing. With no need to keep the vast majority of the data I had, I finally moved to MariaDB which I knew with the help of phpMyAdmin could better identify where my problem was. This was just a few hours ago.

Since then my DB is obviously growing and below is a screenshot of the most obnoxious of the IDs so far. I have successfully stopped the last entry from growing by excluding it from the recorder, but I cannot seem to delete those entries. Once I can do one I can faff around with the rest.

Can anyone offer some advice to point me in the right direction?

Have you tried the Purge action in HA? It allows you to select individual entities and number of days to purge.

Not that it’ll hurt anything, but there really was no need to switch to MariaDB. The SQLite DB Browser works just as well as phpMyAdmin to view and manipulate the data. I just copy the database file to a temp directory on my laptop and do all my fiddling there. If I end up having to run some SQL to hammer the database directly, I test it on my copy of the database first, then shut down HA before I run it against the production data.

I had not before you mentioned it. As I said I’m really quite like the meme image of the dog sitting at the desk. I have now done that and made decent progress purging and then excluding some entries from the database. I have also (through a query found elsewhere) found which of the entries in my states table were taking up all the room, included below in the hope that it helps someone else. My N100 that HA runs on was happy with this query, but the Rpi4 that runs another for me was not (similar 2.5gb db), even when removing the cnt_pct and bytes_pct. In the end I trimmed out the cnt as well and it seemed more inclined to acquiesce to my request. I also include that here for anyone like me who has little grasp of this particular side.

This helped me find the larger items which included some from my printer, some from the Octopus integration and many from the addon that controls my battery storage and PV system, those entries alone came to 1gb in the state_attributes table. I can now start getting rid of the data from those that are not needed for the 14 days that the software recommends keeping everything else related to the PV and storage system.

SELECT
  COUNT(state_id) AS cnt,
  COUNT(state_id) * 100 / (
    SELECT
      COUNT(state_id)
    FROM
      states
  ) AS cnt_pct,
  SUM(
    LENGTH(state_attributes.shared_attrs)
  ) AS bytes,
  SUM(
    LENGTH(state_attributes.shared_attrs)
  ) * 100 / (
    SELECT
      SUM(
        LENGTH(state_attributes.shared_attrs)
      )
    FROM
      states
      JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id
  ) AS bytes_pct,
  states_meta.entity_id
FROM
  states
LEFT JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id
LEFT JOIN states_meta ON states.metadata_id = states_meta.metadata_id
GROUP BY
  states.metadata_id, states_meta.entity_id  
ORDER BY `bytes` DESC
SELECT
  SUM(
    LENGTH(state_attributes.shared_attrs)
  ) AS bytes,
  states_meta.entity_id
FROM
  states
LEFT JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id
LEFT JOIN states_meta ON states.metadata_id = states_meta.metadata_id
GROUP BY
  states.metadata_id, states_meta.entity_id  
ORDER BY `bytes` DESC
4 Likes

Well I must say I’m happy! Along with the queries listed here and the tips I managed to get my database size from 550 MB to only 52 MB! :partying_face: Thanks…

In case it is useful to anyone, this is my recorder configuration. I went with exclusion by default. Only including data I actually need. Which wasn’t that much really.

Summary
recorder:
  include:
    domains:
      - alarm_control_panel
      - automation
      - light
      - person
      - scene
      - sun
      - vacuum
    entity_globs:
      - binary_sensor.*_bezet
      - binary_sensor.*_occupancy
      - binary_sensor.*_door_sensor_contact
      - sensor.*_pressure
      - sensor.*_humidity
      - sensor.*_temperature
      - sensor.*_illuminance_lux
      - sensor.*_illuminance
      - sensor.gas_meter*
      - sensor.playstation*
      - sensor.sun_solar*
    entities:
      - zone.home
      - sensor.power_meter_active_power
      - sensor.power_meter_active_tariff
      - sensor.power_meter_total_power_import
      - sensor.power_meter_total_power_import_tariff_1
      - sensor.power_meter_total_power_import_tariff_2
  exclude:
    entities:
      - sensor.toilet_motion_sensor_illuminance_lux
      - sensor.opslag_motion_sensor_illuminance_lux
      - sensor.badkamer_motion_sensor_illuminance_lux
      - light.zenwifi_xt9_led
      - sensor.wasmachine_laundrycare_washer_option_temperature

The query can be “enhanced” to include only data of a given period (in the example 28 days). this helps in case where new sensors have been recently added:

WITH filtered_states AS (
SELECT *
FROM states
WHERE last_updated_ts >= strftime(‘%s’, ‘now’) - 28 * 86400
)
SELECT
COUNT() AS cnt,
COUNT(
) * 100 / NULLIF((SELECT COUNT(*) FROM filtered_states), 0) AS cnt_pct,
states_meta.entity_id
FROM filtered_states
INNER JOIN states_meta ON filtered_states.metadata_id = states_meta.metadata_id
GROUP BY states_meta.entity_id
ORDER BY cnt DESC;