Fix sensor logging interval, safe space on fastly growing database

I have HomeAssistant running for a couple of days right now and i have added a lot of new integrations and entities.

While i was adding new sensors on a daily basis i saw that the ‘home-assistant_v2.db’ has been growing quite significant. After just 6 days it became over 500 MB.
So i started looking for answers and set the database location onto a MySQL server. After not even 1 day the database size became over 100 MB!
I also noticed that because nearly all of my sensors (87 (custom) entities) send data towards HA, every value was logged.
(Due to a calculation sensor i cannot change some of my sensors to send data in more then every 5 seconds, some other sensors will then become innacurate.)

After this i have tried:

  • removing things from history:
history:
  exclude:
    entities:
      - sun.sun
      - xxx,xxx
  • removing things from recorder (the same one’s as removed from history)
  • changing the rate devices send data towards Home Assistant

All these changes did help a bit, but it is still growing at fast rates, and therefor i have a couple of questions:

  • Is it possible to be able to receive data from a sensor every 5 seconds, but only log it once every 1/5 minute(s)?
  • Is it possible to remove some history after x days (and not all data, like would happend with ‘purge_keep_days’)?
  • Is there a way to see what sensors/switches or just all entities make the log so big?

You can use DB Browser for SQLite to examine the contents of home-assistant_v2.db. However, if you have many sensors reporting every 5 seconds, it’s obvious their data will represent the majority of what is contained in the database.

The answer to your two other questions is no.

  • All state-changes for all entities are recorded unless you exclude them entirely.
  • You can limit the number of days of recorded data but it applies to all recorded data.

There are other options for storing data such as MariaDB and InfluxDB but I have not used either of them so I can’t offer any guidance.

Hi, thanks for your fast reply.

Sad to hear that the things i wanted dont work. But now i have a new question based on some research:

In the picture here above it says events and states. The state seems to be filled with all of my sensors and their data. But is there a reason why events is so big?

Okay i think i’ve found out that these sizes are normal.

I think i am also just overused to (where i come from) Domoticz.
There the DB was always very small and not growing at all.
I do have to say that the amount of sensors, and rates here is also x10 x500 so that might explain why.

Just some more information added to this;

If you are having the same issue and you want to find what is going wrong please try out this (in phpmyadmin):

SELECT entity_id, COUNT(*) as count FROM states GROUP BY entity_id ORDER BY count DESC LIMIT 25 

The other reason why the database grows so large is because Home Assistant does not store data very efficiently. You may have noticed that the database is used like a flat-file and each record, for a given entity, contains redundant (duplicated) information. This simplistic storage technique not only bloats the database but it makes it inefficient to extract information. For example, it makes the Logbook and History features painfully slow.

I’m not reporting anything new, of course. The issue is well known but improving the design would involve a fair amount of work and involve a breaking change.

2 Likes

I’ve come to notice that actually. ‘Luckely’ i don’t use a Raspberry pi but an actual physical server (8 cores @4Ghz).

But i found something that might be the cause of the size growing at a fast rate:

Is it possible to remove the data from state_changed, or is this the main part of the logging?
If it actually is the logging i could design a script (or look for one) to remove data i don’t need. In this case i might be able to achieve the 5 minutes of saving instead of the 5 second-savings.

I’ve never tried it but I suspect if that if you do it may cause unexpected/undesirable results. I doubt Home Assistant is designed to gracefully handle missing portions of its database.

That is true, but i still cannot seem to find out what it exactly does other than say:

{"entity_id": "sensor.used_power", "old_state": {"entity_id": "sensor.used_power", "state": "669.0", "attributes": {"unit_of_measurement": "W", "friendly_name": "Power Usage"}, "last_changed": "2020-04-20T17:22:14.336020+00:00", "last_updated": "2020-04-20T17:22:14.336020+00:00", "context": {"id": "c3586a7ccd72463d92bfa183d7e44f81", "parent_id": null, "user_id": null}}, "new_state": {"entity_id": "sensor.used_power", "state": "697.0", "attributes": {"unit_of_measurement": "W", "friendly_name": "Power Usage"}, "last_changed": "2020-04-20T17:22:24.344023+00:00", "last_updated": "2020-04-20T17:22:24.344023+00:00", "context": {"id": "48f0f3e11c6745bd991bc66436eabf27", "parent_id": null, "user_id": null}}}

But i did find this which i am gonna try out and see what it does:

Allright, here i am again…

This time i added this to my configuration.yaml file:

recorder:
  db_url: mysql://user:[email protected]/HomeAssistant
  purge_keep_days: 730
  commit_interval: 10
  exclude:
    domains:
      - automation
      - scene
    event_types:
      - state_changed
    entities:

This actually seems to have stopped the huge amount of ‘state_changed’ entry’s:

I did not see that i was missing something in the History file from my sensors, and all data seemed to be live and going good.

I’ll post another update tomorrow if i have more information :smile:

2 Likes

Allright, do not do that… :shushing_face:

I thought it kept on loggin things. But litterally all loggin stopped. It’s like the database aint there, or is only able to read from it.

Time for a new plan…

Trying to remove the events (after they have happend). I have no clue why this doesnt change anything then to the history. Because i had a sensor that had been removed, but still in history. I removed all of it’s items in events and still log file seems to be fine.

What are those events for, and do they need to be stored after they have changed??

1 Like

So I need to prefix this with I’m not a home assistant dev and therefore can’t say for certain. But looking in my own DB it seems pretty clear to me that the state_changed events are your history. What’s under states is just the current state of all your entities. The record of how those entities have changed over time seems to be pulled together from the state_changed events.

If i’m correct this far then that would mean excluding those events is the equivalent of listing every entity and domain in the exclude section of recorder. Those entities still work fine but you have no history.

And it seems like what you are doing manually is what the purge_keep_days option is for. You currently have that set to 730 which means things will only be removed from your DB after they are 2 years old. That’s quite a long time, your DB will definitely get huge at that rate. Plus all the tools HA natively provides for digging into that data are definitely going to struggle at that kind of scale if you actually tried to use them to analyze two years worth of state changes.

I would suggest that you turn down purge_keep_days to something pretty trim. I know I have mine set to 5 days and have a pretty long list of entities and domains to exclude on top of that. If you want to keep all that data I would suggest looking into InfluxDB. My understanding is its a time series DB that’s much better suited to analyzing data over time like this at scale.

There’s an add-on for InfluxDB this if you want to try it out, its very easy to get set up. However for this amount of data you might want to look into hosting InfluxDB externally on a dedicated machine and connecting HA to it. This is something I had on my list to check out but haven’t gotten around to yet. There is definitely plenty of posts around the forum here on people with dedicated NAS setups for tracking all their data over time

1 Like

Thanks for your help!

I am happy i didnt delete the state_changed then because that would’ve (altough the DataBase is pretty new) been a bad thing to do.

Right now i am stored and set on an MariaDB SQL server, which (as maybe u saw in the pictures) isn’t only used for HomeAssistant. But i will look into InfluxDB if MariaDB or HA gets slow.

About your last 2 sentences; I am actually on a pretty ‘Okay’ NAS. But the database is on a server with (only) 64GB. I will keep my data for what it is right now. (expecting over a million of entries in state_changed)

I was hoping i would be able to save older data in HA to but i might look into some other things like writing up somewhere how much the statistics where. (say i used 23kWh on a day, i only store the 23kWh somewhere for that date)

Just a quick comeback to this, as Home Assistant has mostly fixed it!

This was at the start. I had a bunch of sensors just sending data at high rates towards Home Assistant. Now i actually control the flow with Node-Red (MQTT). I don’t log a lot of data anymore at all.

As the quote says, i had over 100MB per day. This now is only 20 to 25 MB’s per day. Which actually doable, because the database won’t grow over 10 GB in 1 year. Then i will just drop the database and make it ‘Archived’.

If someone else is also having the same issue, please check WHAT you are saving and WHEN (e.g. how many times per minute). If you can’t change the sensor data, and the history of it ain’t important. Add it to the exclude in the recorder settings.

If you still need some logging but want to keep the database size small, you can limit the logging interval.

Set up the recorder integration to exclude problematic entities:

recorder:
  exclude:
    entity_globs:
      - sensor.air_quality*
    entities:
      - sensor.temperature

Use filter platform to create a virtual sensor with limited logging interval:

sensors:
  - platform: filter
    name: Living room temperature
    entity_id: sensor.air_quality2_temperature
    filters:
      - filter: time_throttle
        window_size: 00:01

00:01 = 1 per minute.

2 Likes