Yep. I have an automation that runs the purge every day at midnight because of that.
quite a few, but I don’t have an official count. Some of the biggest ones I have that I know of that log to the database are:
- Dark Sky weather forecast info
- Person/Device Trackers
- about 30 ZWave devices and all of their attributes. 4 of them are smart switches that log power, current, voltage, etc (about 6 sensors total) once every 1-2 minutes (on a motion sensor or door sensor, that’s about 8-10 individual sensors, on a switch or light maybe 1-2)
I wonder what is actually the best database to use…
What is the fastest and most effective Db to use?
That’s a very subjective question. Honestly it depends on your use case and what features or performance you are looking for. Personally I already had MariaDB running for my Kodi media player library so naturally, I just used MariaDB instead of trying to set something else up. It works for me.
Where did this rumor come from? The recorder should purge after 30 minutes if it is due at startup.
The database is rarely the problem. Most of the delay with logbook and history is in HA code, either the Python core or the frontend. A faster CPU in those places helps a lot.
If it is a rumor, some documentation would clear it up… state that 30min after boot thing, and maybe add some logs to look for to verify it ran? Rumor or not, I use an automation to purge so I can control exactly when it happens… which is not while I’m at work and the process could delay alarm stuff.
logger:
default: warn
logs:
homeassistant.components.recorder: debug
Not a rumor. Doesn’t the next line say the latest the purge function will run is the purge interval set in the configuration? In my case it’s set to 1. I use docker and blow away the container to upgrade, wouldn’t this reset things such that HA doesn’t think it’s due yet? Therefore, it would run at the latest 24 hours after starting HA.
Documentation explaining the purge function, its timing, and other relevant info coming from those who understand it best would help clear up any confusion. Pointing to a single line of code is not documentation.
EDIT: Also, if it’s supposed to run at the earliest 30 minutes after start, then why does the size of my DB not change until after 24 hours? As a test I have restarted the container and my DB did not change size until 24 hours later. Also if I keep restarting HA before the 24 hours mark for uptime, my DB grows and looking inside the DB reveals entities that are older then the 28 day keep days I have set. As soon as HA reaches the 24 hour uptime mark, shortly thereafter my DB shrinks and those entries disappear.
I agree. I have always read on the forums that if you set it to purge everyday that HA has to be up for a day before it will trigger. If the purge interval is 2 days then 2 days. This is why I have an automation that runs at midnight every day to run the purge.
EDIT: There’s actually all kinds of shit that breaks when HA restarts… delays in automations etc are zeroed out… if HA is aware and can continue on after a restart then not doing this is just lazy programming…
Not to mention in the case of the OP, they did not have purge interval as a Config option in their setup. The DB size did not go down until after 24 hours as indicated by their response in post #10.
Sorry if I came across as snide, I was actually genuinely curious.
The code tries to get back into sync after a restart. So if it purged last night at 9PM and you restart today at 6PM, it should purge in about 3 hours’ time (with a lower limit of 30 minutes).
So I guess the fully correct way to state it is that it can take up to 24 hours. Now I see how that can lead to a rule of thumb that one must wait for 24 hours. Even if one does realize the bit about synchronization, there is no simple way to tell the time that it tries to synchronize to.
It can also be confusing that purging will not shrink the database but just stop it from growing because parts get marked for reuse.
Incidentally, I think this should be simplified to what several posters are already doing: remove the purge_interval
setting and just purge every night at 4 AM if there has been no purge in the past 24 hours.
This would be easier to document and it would still allow users to move the purge time to an automation if 4 AM is for some reason inconvenient. However, purging should actually be quite fast these days and not cause an interruption like it did in older releases (years back).
Yeah I agree with that. I use nodered, and it behaves similarly, requiring workarounds to prevent timers reset by a reboot from causing issues. I guess if ‘persistent timers’ were added to HA, nodered may not benefit. OTOH, seems the way persistence is done in HA requires a recorder entry in the db… that could be inefficient for just timer persistence, where only a ‘timer started’ timestamp is required. Actually, I thought “state before reboot” should have been a thing for all entities regardless if they have a recorder entry or not… not huge data (just latest timestamp of a known value) and it would make reboots work better despite how recorder is configured. As it is, I think the only way for HA to know pre-reboot state is to have a recorder entry… and for example storing timer stamps for 7days seems just stupid.
See, now that is something I haven’t heard before that would totally help explain this. Something like that should go in the docs for sure!
FYI, I now submitted a pull request to achieve the above: Purge recorder database at night by amelchio · Pull Request #33646 · home-assistant/core · GitHub
For those looking to find out exactly what events are filling up their MariaDB, I made a video on how to do that here using the phpMyAdmin add-on:
Just to clarify… once we run the query, we can delete the line and it only removes the data stored - not the entity? Sorry, I’m not too familiar with database commands.
The query does just that. It does not edit the database at all. It’s only for finding out what is filling up the database.
Just in case anyone stumbles across this, MariaDB works in a very different way now days with more tables etc. Working code to give the same results as the video is:
select sim.entity_id, count(*)
from states as sid
join states_meta as sim on sid.metadata_id = sim.metadata_id
group by sim.entity_id order by count(*) desc
Anyone ever figure this out?
Mine is up to 50GB…the “states” table in MariaDB is 50GB by itself, and it looks in a browser like most of it is “NULL” values. Something must be amiss?
Already tried a purge last night to 10 days.
I’m wondering if there is some stale/corrupt data that needs some kind of manual cleanup?
Wow, you have more than 200 million database lines. I have a 7 day retention period and about 300 thousand lines.
You can check 2 things:
Use the sql statement from townsmcp one post above yours, there you can check how many lines are there for every entity. For 7 days an entity should not have more than about 50.000 lines (for me the most updated entity has 42.000 lines in 7 days).
Second thing is to look if there are database lines older than your retention period.
With the following sql query:
SELECT * FROM `states` order by last_updated_ts asc;
It will sort the entitíes by the date of the last_update. In the colum last_updated_ts there is a number like 1702696320.794098, this one you can convert to a date, for example with unix time stamp converter (https://www.unixtimestamp.com). If the date is older than your retention period setting there is something wrong.
Just to clarify, your recorder settings look like this and auto_purge is true?
recorder:
db_url: !secret mariadb
purge_keep_days: 7
auto_purge: true
commit_interval: 30
...