HA DB very large 32GB: state and state_attributes

Hello,

i have got a really big problem, my HA database is grown very fast from 4GB up to ~30GB.

There are two very big tables:

states ~17.833.428 4,3 GiB
state_attributes ~13.557.699 14,4 GiB

I use a MariaDB, HA is latest availible version.

The recorder. Purge thing won’t work. I don’t know what to do.

Have someone any idea?

Thanks a lot!

Best
LordShelmchen

Please share your configuration.yaml setting pertaining to the database url and the recorder as in the below example.

recorder:
  purge_keep_days: 16
  db_url: !secret database_url

If you permanently store your data and do not filter out all the irrelevant stuff, your database size will explode.
I recommend filtering what is recorded.

P.S.: You can think about reverting to stock SQLite also. It was improved and might help with things like purging in the future.

But neither of the two will solve your current problem. Just help prevent them next time, maybe.

Did you use a database explorer to check what is writing so many entries?

In the last years the DB size was almost the same, so i don’t know whats happen.

Sorry, i forgot the configuration.yaml

recorder:
  db_url: xxxxxx
  commit_interval: 120
  purge_interval: 1
  purge_keep_days: 7

I also tried to purge manually with recorder.purge, but nothing happens.

What hardware platform are you on?

Please DO NOT listen to people stating you need to filter entities. This is incorrect info. Even with hundreds of sensors DB does not grow that much.

It most certainly will grow. Admittedly, at some point it will level off a bit. But a lot of people are shocked to discover how large it can grow to.

How bad that is would depend a lot on what hardware you’re using. Obviously the OP is concerned about it getting to 32G. So I’m assuming that’s beyond the limit of what they’re comfortable with in their own environment.

Another issue for some people is not only the DB size, but the overhead in writes.

By default, HA records every state change and event. I think we could do a better job of directing new users to the exclude and include options. I can see absolutely no reason for recording every time, for example, the weather forecast changes. There is value in thinking about each entity at the time you add it, and deciding whether or not it’s worth recording or should be excluded.

I don’t think we should be discouraging people from doing that, at whatever level works for them.

Are you including /media in your backup? Do you have a camera that you added recently?

(stored camera snapshots in my backups was the culprit for me - went from 42G down to 9)

In my case, I just changed the process that does my backup to exclude the media folder.

HA is running on a raspberry pi 4.

No, the media folder is empty.

I have a look at the filesystem, it is only the DB and within the DB it is only these two tables.

At least i stoppt recording for now. Can I simply empty the two tables? What effects would that have?

Unfortunately I only have 200mb free on the SD card, so I can’t make any big jumps.

Any idea what I could try?

Copy the database off to some other device, and delete it. (Do all this with HA shut down, of course.) HA will build a new one when it restarts.

There should be more than 2 tables in the database. Mine right now has 13 tables.

If you’re running off an SD card, you should be aggressive about excluding entities from Recorder. As I mentioned above, it’s not just the size (although this is indeed an issue) but also the number of writes. Apparently there’s a lifetime limit on writing to an SD card.

This link should give you a good start toward reducing the size:

For the record, my database right now is sitting at under 23MB. A bit higher than I’d like. I think it would be a lot smaller if I could get rid of all those long-term statistics data I never use.

1 Like

Deletin the DB and correct the record entry to limit the recordet entities works very well.

Thank you all for great and very fast support :slight_smile:

Tom raises a very valid point. If you run your system on a compute potato, like the RPi platform, you should certainly concern yourself with filtering entities. However any modern x86 platform will handle this with ease, without any effort filtering entities. It all boils down to what is the most valuable commodity to you. Bare minimum cost or time.

My time is the more valuable commodity here. HA was built around the RPi and (still, last I knew) has a very simple install for that platform. My little old RPi 3B+ sits at single-digit CPU utilization and around 50% memory utilization running HA.

Why would I spend more time (and money) on something I don’t need?

Unfortunately you do not understand my point; but that’s OK; everyone to its own priorities in life.

I totally agree with you that HA is flexible enough to offer something for everyone.

As for understanding your point, I definitely got that you think the RPi is a “potato” and not “modern” enough for your preferences. That’s fine. Again, I think we agree that each of us should spend our time and money on those things which are important to us.

I am not sure what the platform has to do with the database. An RPi 4 can easily handle the database, no matter if small or large. And it can handle unfiltered writing.
But the issue is database size and that is a simple function of number of devices and datapoints (plus logging if you are no careful).
So if the database suddenly grows quickly, you either have a lot of errors being recorded or some device is writing lots of state changes.

I am filtering data before it reaches Home Assistant and filtering data that is being recorded. I don’t need to know the quality of the wifi signal of my Shelly. So I will not record it. You must not forget, that every package containing relevant data usually also contains the irrelevant ones.
So an MQTT message with the power consumption (relevant) also has 10 entities that are irrelevant. With filtering you create one data point. without you create 11. Ergo, much faster database growth.

Especially when you keep the data (I personally keep the original data indefinitely rather than just the “filtered” statistics).

So how do you best deal with it when an integration creates a ton of state.attributes, that are just duplicated?
Example:
state_class: measurement
humidity: 59.34
temperature: 78.314
unit_of_measurement: g/m³
icon: mdi:water
friendly_name: TC Standing Desk Absolute humidity*

Unfortunately there’s very little we as users can do. Exclude everything we don’t need. Set the keep_days to a minimum. Don’t add friendly_names or UoM, or at least keep them short. Store the most important data elsewhere; the data I really care about are sent to a flat text file which I can open in Excel or any data analysis program.

Honestly it boggles my mind that this static data is repeated in EVERY state change record in the database. I’ve seen some movement to reign in this silliness in some areas, so I’m optimistic that the problem is now recognized and developers are moving toward resolving it. I doubt it’s going to be a quick fix though.

1 Like