My home-assistant_v2.db file is growing despite purge set to 3 days

Since a few days my db is every growing see the screenshots and its getting worse. I have added a few more sensors, but why is the purge function not working? Can this be related to a OS function of freeing up space when purged?

I’m using sqlite file, any idea is appreciated.

Thank you very much

this is my config:

recorder:
  purge_keep_days: 3

LMGTFY

If you restart your HA instance sooner than the purge interval it will never purge.

In the docs for recorder under auto-purge it says this:

Automatically purge the database every night at 04:12 local time.

You’re showing growth over what appears to be roughly a 24 hour range. I’m not exactly sure where 4:12 is on that graph but that’s the only time you should expect the size to drop. I also don’t know what is in your DB so I couldn’t tell you if it should be dropping at that time or not. If everything in there is less then 3 days old then nothing will be removed. But regardless you will only see a climb in size a vast majority of the day with the exception being 4:12 AM.

If you think the purge process is not working then I would suggest digging in more. Query the DB and find something in it which is 4 or more days old that you know shouldn’t be there. You can also force recorder to purge at a different time then its normal schedule by calling recorder.purge either from developer tools or via an automation. If that doesn’t seem to be cleaning out things you know for certain should be because they are more then 3 days old then you should report an issue in github.

This doesn’t make any sense. Literally any time you restart HA there will be something in your DB that is less then your purge interval in age. Are you suggesting that any time you restart HA you’re causing a mass of state data to become “unpurgeable”? If so you should probably submit a bug because I’ve never seen anything like that.

I have seen all those posts, but I would also guess that if it runs longer than 3 days purge should work.

This is a 30 days view from grafana it just started recently when I moved to latest May release. I have now also triggered manual purge service with repack, this is why it drops today back to 980MB. But was unsure what could cause such a massive growth in a few days.

It used to be an issue but was fixed quite a while ago.

There are some SQL queries in this topic you can use to see what are your most actively logged entities (skip down to the second method " 2. Good-enough approach"):

1 Like

perfect found the ones making additional space up, will now monitor if will get into a state where it then stays at the same lavel.

Thank you very much!

This is not at all what I said or implied.

Stated differently by @tom_l here:

Have you been restarting every day?

Restarting home assistant resets the purge interval timer.

@ha_frw I don’t know what you read but my quote and link above literally comes from one of the first Google hits.

From what I understand it works like this (and I haven’t yet read the code to check, so I stand corrected but it’s based on the many posts on this topic): By default, there is a job that runs at 4:12. That job checks a timer to see if the timer, set to purge_interval, has expired. If so, it will delete everything that is older than purge_keep_days.

well I tried to do my due diligence, searched multiple time, installed the file monitor to review the file size. did a new file after it had been growing to 8GB before and I was unable to purge. But sometime you search differently and your are stuck and then it should be fine to ask for second opinion. And yes your search indeed did use other key words than mine, so at the end all input from you guys was very helpful and thank you for that.

But still I think it is broken if a purge is not successful if you restart HA. The purge should check the unique timestamp of the table entries when it was written and not anything else. just my 2 cents

That is no longer an issue. The purge now happens daily at 4am no matter what.

1 Like

thank you @tom_l for clarifiying and also the hint to your guide! seems a newly statistics entity is making the new size. Also it shows me that due to loger days my PV system is giving me more data and that is causing a natural increase.

1 Like

I have now reviewed, and tried to review the states table using this command:

SELECT 
    entity_id,
    COUNT(*) AS cnt
FROM states where last_changed < date('now','-4 days')
GROUP BY
    entity_id
ORDER BY
    COUNT(*) DESC;

it shows me still states older than 4 days? Is that an error/bug?

Sorry, I misunderstood, see that now. I wasn’t aware of this restart+purge bug in the past, didn’t realize what you were referring to.

If you have a few particularly noisy sensors one thing you can do is this:

  1. Create a filter sensor. Set entity_id to your noisy sensor
  2. Set recorder to exclude your noisy sensor so it doesn’t capture any history for it
  3. Any place in your dashboards where you were showing a graph of the noisy sensor, point to the filter sensor instead.

It’s a bit annoying because now you’ll have two sensors instead of one but it allows you to control the rate at which history gets recorded. The filter sensor has a number of options for controlling the rate of recording data. The simplest is to use time_throttle so it only records the state once per time window. If that doesn’t capture data accurately enough or produce the graph you want can try some of the others.

I hope someday we get a way to apply filters to existing sensors to control how their history is recorded without making new ones. There was a WTH about it but nothing has come out of it yet:

2 Likes