Database already huge

The “Days to keep” option, does this purge the Energy data also? I guess that data is something one wants to keep! So one should know what he/she is doing.

No, energy dashboard data if correctly configured are kept despite the number of days configured for recorder purge…

I can confirm that the data are kept even if you don’t correctly configure the energy dashboard. My Statistics table continues to grow without regard to the Recorder purge days. Since I never set up any Energy stuff, I don’t use these data. Deleting old statistics records is one of the steps I perform during my routine database clean-up. Last time I did this, I deleted over 12,000 rows from the Statistics table (this was about a month after the previous clean-up.)

Hi Tom,

yeah you really should try my strategy, its working really well.
I did a lot of reading in forums and redit, and i find this way really efficient.
At 27 december 2021 i lost my db, so i started over, and i’m impressed how less space it takes now.

for the back-ups i’m using this add-on

It works well, i create daily backups to my nas, and i only need to delete them manually in HASS after a week.

regards

I’m glad you’ve got a Recorder exclude strategy and a backup strategy that work for you. I think those are the two most important things to get right for a reliable HA implementation.

A couple of minor points for anyone reading this:

The Samba backup add-on and the Google Drive backup add-on are fantastic tools. They’re well written, well documented and well supported. BUT… neither one will create a backup on another platform. Both use the native HA process to create the backup on the media HA is running on, then copy it elsewhere.

This distinction is important if you’re concerned with space (or I/O) on the media, for example, if you’re running HA from an SD card.

Deleting the HA database is something every new user should experience. It sounds scary, but all you lose is a few days of history about when entities changed state. Unless you’re using long-term statistics (which I don’t believe HA is the best tool for anyway) you should be comfortable doing this.

The good news is, once your HA environment starts to mature you shouldn’t need to delete the database. You can keep it lean and efficient, you can back it up, and you can purge and repack it. Again with the exception of long-term statistics, which I’m hoping some day will be added to the Recorder: Purge service so we won’t need to off-load the database and manually run SQL against it.

Yes, there are some “Re-fining UI” there needs to be done(in my opinion), for new and average users to feel “comfortable” in using HA … I.E “Developer Tools” , im not a Developer(what should i do in there ?) , thou i do have some(quite more than average) IT experience, … but still for more than few month i hardly looked in there(no descriptions, no guide, no links to guides) , at-least they could rename it to Tools, and then with a “short” description, and place a few “other” tools, that you actually find “here and there” or link to same
( Call services ?, i haven’t set up any services … States ? , ok that’s probably just about how the system is doing right now … Templates, GEE , get me out of here ! , % set json relative state_attr… Events: ??? … ok at-least there are links there, last 2(if people are not already confused!, or in despair ) Statistics , ok it looks empty, what a relief :slight_smile: :joy:

I.E Tools could included an “overview” of DB-size, filesystem-size logfile-size etc. + List of above “Tools” And other “easy to use” tools … i know Add-ons, backups and Supervisor, under Configuration Tab, maybe move Tools in there as well, so it’s not taking up “space” in the side menu

I am so far out of my comfort zone here, but have you looked at an automation to call recorder.purge_entities once a day? Or just recorder.purge?

I have a flow in Node Red that purges my database every seven days using the call-service node and the home-assistant_v2.db file never goes over 1gB.

Or, I could be on a different planet altogether…

2 Likes

I have my “Recorder” under control, and yes as it is a service, it could be called in various ways, in automation, click on a button, or what ever people like, or feel comfortable with, im not requesting anything, as i adapt to the options available, but yes eventually i might end up in a situation where i also “need” an automation, at-least for a “repack” function … once per month, as it doesn’t seems like “repack” in sqlite, is a “default” procedure(which it should be in a “in app controlled db” ) in my opinion (not after every purge of single entities, but weekly/monthly, whatever) … but your automation example is definitely something to look into, for people who have problems keeping their Recorder/DB under control
There are options to set 7days purge, in HA , calculating DB size is another story, but doable , preferably after a repack … thou there are no options to “select” i.e exclude for specific entities( i actually dont know), but that means you have to include t.ex. 597 out of 600 in a script, or an " IF NOT "clause
EDIT:
Seems like im wrong, there is someone that “fixed” this … haven’t tried thou ( Maybe that’s the “apply_filter” function (with “missing” examples, and a weird description) for ordinary people to understand )

PS: Just did yet another manual “Repack” , 2 days since last , gained +50MB , just for past 2 days purge(set to keep_30)

Yes, this has crossed my mind several times. In addition to the purge options, the Recorder service has Disable and Enable options. I think that means you could stop recording any new data while your automation is doing the purge and/or repack on some schedule.

I haven’t tried it yet, since at the rate HA updates come out, I’m always shutting HA down for maintenance at least once a month, and much more often if I’m making changes. I figure it’s safer to do the DB cleanup while I’ve got HA down for some other reason anyway.

Well, i never use purge.
The thing is when i use purge does all my energy data get lost ?
Without purge my db is above the 350mb from 29th december 2021

Be careful with recorder, the energy dashboard data is stored using recorder so if you suspend the recorder and use the energy dahsboard, you could loose data I think. I discovered it because I changed my strategy for recorder data: I moved from a mix of include/exclude to include only…

So I have in the include:

  • all the entities documented in the energy dashboard (if not included, you will have a warning in the energy dashboard configuration)
  • all my entities displayed using history or mini-graph cards
  • all my entities using the “average sensor” add-on

That’s it… This represents: 32 entity and 6 glob entity (mainly from the energy dashboard where my entities are using the same pattern for naming them)… pretty simple to maintain… and I have tons of entities…

Using this, my database size was divided by 4…

Depends on, if your “integration” comes from your grid/el-provider, or a “local” device thing … i get my consumption/pries from Electricity provider, when i installed it, i got “everything” the Provider has … it took awhile before Energy Dashboard was showing all :slight_smile: , as it was about a years consumption :slight_smile: , i can purge or dump_db, whatever i like, it’s still there
Thou as Didier mention, if you have added your own sensors, that will go in the bin, when purged

Aah ok, just good that i never used it

Well i need to find a way so i can store the full energy dashboard info on something else.
I don’t want it to delete for atleast a year.

regards

yes, im also in the loop of setting up an external DB, for same reasons, my old “webbserver” system, is partly manual, where i was(still) sitting and type in consumption and other “Bills” etc( in various “Forms”, to get graphs/statistics )

I’ll try to explain a few things I think I’ve learned, maybe it’ll help. TL;DR is at the end.

There are two “purge” options for the Recorder service. (1) Recorder: Purge allows you to delete all entries older than a given number of days, which you can filter by entity_id and entity_type in addition to the time-based purge. You can also repack the database (recover unused space) after the purge, if you remember to check the box AND turn on the slider. You won’t free up any storage unless you do this.

(2) Recorder: Purge Entities allows you to delete all of the records for an area, device, entity, domain or glob, or a bunch of those. But I don’t see any repack option, so you’ll want to go back and do that after.

Key points here are that you can be very selective about what and when you delete, and that you won’t recover any space in your database unless you correctly select the repack option.

That all applies to the “old” tables which have been part of the Recorder databaase for a long time; basically, the tables named events and states.

More recently, it was decided that the same database should be used to store permanent statistics data, such as is used by the energy management components. Among others, the statistics table was added. But as far as I can tell, there’s no purge option for this table. Apparently it’s designed to grow forever, so that you’ll always have your full energy use (or whatever) history.

I don’t happen to use HA to analyze my energy use, just to collect the data. So every month I end up deleting over 10,000 rows from this table. It should be noted that I never “opted in” to store this data in the first place; sometime in the past few releases HA created and started populating these new statistics tables without asking.

As far as I can tell, doing a purge and/or repack won’t impact the energy use data stored in these newer tables. The only way to purge them is with an SQL DELETE statement.

1 Like

Statistics were referred to in the blog when that release occurred. I believe you can choose what to keep long term.

I recall reading about the statistics update, but I haven’t seen anything about how to choose what to keep. Admittedly I didn’t research them in detail since I wasn’t using them (or so I thought!)

Please see this post before upgrading to 2022.4 - Database Optimizations in 2022.4

Well i have the beta running, and no difference.

On my test system with minimal devices recording it’s still 15mb a day…
So this would be arround 500mb on my normal system with no extra recorder options set-up.

Now for frigate i changed the directory with cfis to store all data on disk from docker/portainer.
As for maria db there is no way to directly store the data on a nas drive on docker, as of the stupid permission problem.

So i solved that with a older laptop running influxdb and maria db to store everything on there, and no disk space problems anymore.

Please post in the beta thread here