I’ve started about 5 years ago using Home Assistant and added and removed a bunch of sensors. I have setup that the recorder purges every 30 days but somehow my database is still almost 2GB in size.
How can i see which sensor is responsible for all this data without using queries on the database itself.
Only 2GB? I am not a power user, but with 30 day purge I’m running around 5.5GB on average (Bit higher today):
I’d have to look in Influx, but I think this is about 3-4 months of data (when I switched to docker and started from scratch.) And it’s currently storing everything. I haven’t gotten around to excluding anything from Influx.
But I do think it would be nice to be able to see which entities are taking up the most space. Voted.
Having the info is great, but taking action by turning logging for sensors on and off in the UI would be amazing. Done in the same way we can change device exposure to HomeKit and Google.
I would like to see what is taking up the space inside the DB and then selectively be able to set a different purge policy for crap I don’t really need
This is a common problem. There have been some very promising improvements to the Recorder database logic recently, but there are still some things you should do to streamline your database.
This FR and This WTH talk about the ability to selectively set retention periods by entity, rather than one-size-fits-all, as in your 30 day example. Clearly there’s a need to retain some data much longer than others. Please vote for those if you agree.
There have been some complaints, FRs and WTHs written relating to a recent change to the History page, which no longer shows “all” entities. Before the change, that was a great way to see which entities are recording a lot of state changes in the database. Excluding those would give you the most “bang for the buck” reducing database writes and size. Hopefully this oversight is corrected and you can use the History page this way again soon. Until then you can go through your list of entities and surmise which ones might be helpful to retain and which can be excluded.
Once you’ve got your excludes or includes in config, and your purge_keep_days set where you want it, you can use the recorder:purge service to clean up and reduce the size of (repack) your database file itself. Note: To select the repack option, you need to tick the check box and turn on the slider button or the repack won’t happen.
More flexibility in retention time would definitely help
However there are a number of things that can impact DB growth rate that users can’t easily get a handle on without resorting to some SQL.
It would be useful to be able to easily see which entities have contributed the largest number of events/states in the DB.
It would also be helpful if the purge service generated some information about how much data was purged and roughly what it belonged to.
On a related note, it would be helpful to be able to use the statistics data that is already being stored in the database instead of having to create additional helper entities to display/automate based on something other than the current value. WTH can't I access the stored statistics data in my automations and templates?
All good points. There’s so much that could be done to improve the Recorder and how the data is presented and managed from the UI. Fortunately I’m pretty familiar with SQL, but it’s not exactly user friendly for anyone who isn’t. Keep voting for all the database-related FRs and WTHs, maybe they’ll get some love from developers.
I would suggest reviewing your History graphs for any noisy analogue entities that are not required then exclude these under Recorder.
Generally history databases only record changes so any noisy (erratically changing) analogue entities will be consuming most of the database space. Slowly changing analogue entities will not consume much space and binary entities even less.
I agree it would be helpful to know what exactly is using the most space.
Paulus (balloob) also suggested at one point that in the future the database might omit state changes for long term statistics eligible sensors. These tend to be the noisy analog sensors. And statistics eligible sensors not only generate hourly long term statistics that don’t get purged, but the also generate every 5 minute short term statistics that get purged at the same purge interval as everything else. These short terms statistics could power the history graphs and similar.
The one challenge though is context ids. Currently context ids, which are used to track what triggered a change (e.g. what triggered an automation, or what caused a service call) are stored on the state change event row (or event row for non state change events). Without those rows we could end up with rows that specify a parent context that has no record in the db. While not technically a problem, it could be confusing.
(This same loss of context issue seems to prevent another smaller database optimization, namely: for entities with recorder excluded attributes like sun.sun, we still record a new state row for each time the attributes change, even if the main state value does not change, and the resulting database row has basically zero value, and ideally we would only write new rows if the state value or a non-excluded attribute changes.)