DB Size and long term statistics

Hey there,

I have a problem that I would finally like to tackle.

At some point I switched from SQLite to a MySql database. Now my database is about 22GB in size, which is just way too much. A backup is no longer possible and I would now like to clean up the database.

But: I don’t want to lose the long-term statistics of the energy values. We put a PV system into operation last year and I would like to keep the values of this system (including the electricity consumption in the house) so that I can have a year-to-year comparison.

All other values are not so important to me and could also be deleted after 1-2 months. One culprit, for example, is the entity “media_player.waipu_tv_stick”. With over 23 million entries, this is the top 1 space waster. I don’t need this value at all.

So I need help on two points:

  1. How do I have to configure the HA so that I can record the statistics of my GoodWe inverter for the PV system, but the rest deletes itself after 30 days?

I’ve read the documentation on the recorder. But it’s not clear to me, how to keep the energy entities for longer than 30 days.

  1. How do I make the database smaller? The previous recordings (all except the PV system) can be deleted.

When I run the recorder.purge_entities service, I immediately get a positive UI response, but nothing happens.

Would anyone like to help me?

Thanks in advance,
Alexander

Hi Alexander,

The answer is to push the data that you want kept long term into a different database. The one being used by HA needs to be fast and as small as possible to keep your moment to moment state machine actions as fast as possible. Store long term data into InfluxDB or some other secondary database. Keep the record days on the Ha database in the default range and eliminate (using recorder) stuff you don’t want to keep.
You will find that the HA LTS data is hourly after the record days limit is reached, and that is stored forever in the HA database. It is likely that is all you will need in Long Term Storage anyway, bur if you add Influx, you will be able to keep every data change stored.

Short answer: You can’t. Obviously there are work-arounds, like setting up an automation to run a bunch of Recorder: Purge actions.

HA only allows you to set one purge_keep_days values for all entities. Feel free to up-vote my FR requesting the ability to set this value per entity, but frankly there doesn’t seem to be much interest among the development community.

I’d recommend being aggressive about excluding as many entities as you can altogether. There’s a good guide here, and there are other threads about keeping the database size manageable. Of course excluding entities also excludes them from the “live” data tables, not just the long-term statistics tables.

For LTS data, you can exclude unwanted entities by editing customize.yaml to set the state_class to “none” for anything you don’t want LTS for:

sensor.guest_rm_temperature:
  state_class: none

Again, this is a crude on/off solution, and doesn’t give you the flexibility to set retention periods based on the varying needs for different types of data.

It sounds like you’re already familiar with database management, so you can also run your own SQL “clean up” routines from time to time. I have a few I run while I have HA down for a version update, typically monthly.

Update: I should probably mention another solution, although you might not like it. I use the notify action in automations to send data about my heating system to plain old flat text files, in CSV format. These can be opened by other software which is better able to visualize and analyze the data outside of HA. I’m actually at the point now where I don’t use the HA LTS data at all. My Recorder database is tiny. Only the data I want are written to the drive HA lives on, and only at the frequency I want to record them.

Is this your current recorder setting? If so, that is likely the bulk of your data, not LTS. LTS doesn’t take that much space, since it’s granularity is hourly.

So, reduce the 30 days and exclude the entities that have LTS, but you don’t need LTS for (if you exclude entities from the recorder that has LTS, neither their state history nor LTS will be recorded). Also exclude any other entities for which you only need a current state value.

Thank you for your thoughts.

Did I understand correctly that HA is going to use long-term statistics anyway?

So, if I set the purge days to 30, everything in the ‘states’ table is deleted after 30 days, but the ‘statistics’ table with the LTS is retained?

How do I find out which data is stored in the LTS?

Yes, for anything you visualise beyond the recorder’s retention period.

Everything older than 30 days only.

LTS has infinite retention.

You can query the tables directly or look in your dev tools for anything with a state class attribute. You can write a template for this too:

{{ states | selectattr('attributes.state_class') | map(attribute='entity_id') | list }}