How to keep your recorder database size under control

Ildar_Gabdullin · September 9, 2024, 9:46am

You are right.
You can NOT delete LTS for entities excluded from Recorder.
My mistake because I asked in GitHub about this possibility several times - but got no positive answer.

81% of whole DB or only “statistics” table?
Assume your DB was 1GB - then will you get same 81% after purging old statistics?
I heard many times that DB is growing up mainly because of not-LTS data.
Cannot check it myself since do not know SQL.

If you do not need LTS - then you need to set state_class correspondingly.
And yes - a user currently seems to be OBLIGED to have statistics - many integrations add “state_class” for sensors. This is a default behavior. You are not asked “switch on LTS” when adding a config entry.
Probably a global switch “allow LTS” could be added - but Dev team does not think it is needed… And users’ opinion is not always meaningful for them. They may say “LTS data need so little of space, no practical reason to disable storing it”. And in this part they could be right: see above.

CaptTom · September 9, 2024, 1:37pm

I know. I was shocked, too. But it really did shrink that much. In other words, the vast majority of my Recorder database was “wasted” (to me) long-term data.

I maintain a very lean database. I exclude everything I don’t need to keep, and I set my purge days to four. I thought some others following this thread might be interested, but I recognize that many HA users aren’t.

In my defense, I’ve been running HA on an RPi with an SD card. Performance, storage and, most importantly, disk writes are at a premium. This is the hardware which the HA documentation recommended when I first started. So although I understand that developers always want to develop on the latest and greatest hardware, I think I represent a non-trivial portion of the user community.

When I was doing development, I always made it a point to develop on the lowest-performing platform that my users were using. It’s all about respect for your users and pride in your work.

I think my experience suggests maybe they’re not right about this, at least, not in all cases.

As for changing state_class, that’s on my list to research and learn how to do, assuming it doesn’t mess up anything else I need. But frankly it’s easier for me to copy and paste one line of SQL as part of the routine database maintenance I do during my update procedure, so it hasn’t made it to the top of my priority list yet.

Ildar_Gabdullin · September 9, 2024, 10:25pm

Please do not say it. It is absolutely normal to expect a smooth work of SW which is declared to be working on your RPi.
As a former developer for QNX and embedded systems, I absolutely hate a trend of making a SW which requires more and more resources w/o a real need.

Hard to believe, so I repeat my question:
Assume you have 10 days, 1000 entities and a database of 1GB.
What will be an amount of LTS?
If it is 800MB - ok, you are right.

You can estimate LTS size in another way:
Asssume you know a size of one record. There are 24 records a day. So you can estimate a size of LTS for one entity for a day.

Ildar_Gabdullin · September 9, 2024, 10:27pm

But in this case HA will spend resources to process LTS and write it to DB.
So, it is better to change a state_class.

Ellcon · September 10, 2024, 4:23am

I did play around with state_class a few weeks back as I saw no need to collect useless data. The following works but you do have to remove and reconfigure the device for the change to take. I now add as default in my yaml.

sensor:
  - platform: wifi_signal
    name: "WiFi Signal"
    update_interval: 60s
    state_class: ''

parautenbach · September 10, 2024, 4:59am

Don’t you get a warning for ''? I had to set mine to None.

Ellcon · September 10, 2024, 7:44am

I know that is what the documents say but I tried NONE and “” but they didn’t work. Only empty single quotes did the trick. Did you set as “None” or None?

parautenbach · September 10, 2024, 9:42am

state_class: None

If you use quotes (single or double), it becomes a string. None and none are special identifiers. NONE won’t work either.

It will work in the sense that there will be no state class, but '' is an invalid state class and HA will complain about it in your logs.

Also make sure to run the repair after changing a state class.

Ildar_Gabdullin · September 10, 2024, 9:46am

Are you sure that this integration allows to set state_class option ?

CaptTom · September 10, 2024, 11:18am

Thank you! This discussion is convincing me to look into changing state_class again.

Toward that end, and for anyone stumbling onto this thread in the future, could someone explain a few more things about the process?
Does this work for all entities, or only for certain integrations?
How would I change entities I didn’t define in YAML, but were added through an integration via the UI?
Is there a quick and easy way to see all the entities currently recording long-term statistics, short of running a query against the database?

petro · September 10, 2024, 12:03pm

null should also work as it’s a keyword in yaml.

https://yaml.org/spec/1.2.2/#10211-null

crowbarz · September 10, 2024, 12:09pm

You should be able to use customize to modify state_class where the integration doesn’t allow you to change it via the UI or YAML.

crowbarz · September 10, 2024, 12:38pm

Is there a quick and easy way to see all the entities currently recording long-term statistics

You can see this list in the UI at Developer Tools > Statistics.

CaptTom · September 10, 2024, 4:32pm

OK, I guess I was hoping to see the state_class of each one, but at least that’s a starting point. I can ignore all the ones I’ve excluded (they show up with an “issue” saying they’re excluded.)

crowbarz · September 10, 2024, 5:33pm

Turns out you can go to Developer Tools > States and in the Attributes column filter by state_class. You can see the attribute value there. I’m not sure if viewing the list that way includes entities with state classes set implicitly though.

Ellcon · September 10, 2024, 7:28pm

Thanks for the clarification. Haven’t noticed any issues but will do some more testing on this.

Ellcon · September 10, 2024, 9:22pm

If I compile with None or none it gives the following error. This is where I got the 2 single quotes from.

c:\esphome>esphome run athom-plug3.yaml
INFO ESPHome 2024.8.1
INFO Reading configuration athom-plug3.yaml...
Failed config

sensor.wifi_signal: [source base/athomplugdev.yaml:51]
  platform: wifi_signal
  name: WiFi Signal
  update_interval: 60s

  Unknown value 'none', valid options are '', 'measurement', 'total_increasing', 'total'.
  state_class: none

Changing to null gives the following;

c:\esphome>esphome run athom-plug3.yaml
INFO ESPHome 2024.8.1
INFO Reading configuration athom-plug3.yaml...
Failed config

sensor.wifi_signal: [source base/athomplugdev.yaml:51]
  platform: wifi_signal
  name: WiFi Signal
  update_interval: 60s

  string value is None.
  state_class:

parautenbach · September 11, 2024, 5:15am

You never mentioned this was ESPHome until now. HA isn’t ESPHome, even though the two projects live in the same camps.

As the ESPHome docs say:

state_class (Optional, string): The state class for the sensor. See Sensor entity | Home Assistant Developer Docs for a list of available options. Set to "" to remove the default state class of a sensor.

Now we’re back to square one.

Why are you setting a state class in the first place? It’s optional. Just don’t set it.

Ellcon · September 11, 2024, 9:23am

Sorry, why am I at fault? The discussion was about database size and most of my data comes from esphome devices. It may be a subcomponent within HA but it is HA recording the data.

I have never set a state_class in my code but HA automatically sets one. The only way I’ve found to prevent this is as I described.

CaptTom · September 11, 2024, 10:28am

Yeah, I’m having trouble following the discussion, too. It would be good to clarify when we’re talking about templates, ESPHome entities, or entities added by various integrations. Part of why I’ve been ignoring state_class as an option for limiting DB bloat is that it’s not really clear how it’s established, what is impacted by changing it, and where to change it. As I said, for me it’s easier to write a one-line SQL hammer and run it every so often.