I have been unable to find a reliable “How to” tutorial on how to do this. I have found a few things that refer to 4 different databases, but no indication that a laymen like myself can see to say of they are installed with recorder… And I can’t even seem to find the recorder integration anyway.
I am looking for a reliable “If you really don’t know all the ins and outs” tutorial on how to increase the data retention of home assistant. I assume one exists, but I guess in my old age I am not as good as finding things online as I used to be.
You only need the recorder docs.
HA has one DB (which is SQLite by default, but could be MySQL/MariaDB, Postgres, or others). In there, there are the main state tables and the long-term stats storage (LTS).
State history is controlled by the recorder’s retention period. You can include/exclude entities from the recorder to optimise storage. This is stored in the main tables. Keep in mind that HA wasn’t built for extremely long retention periods in the main tables, barring the LTS tables.
LTS data is rolled up to the hour and is stored indefinitely.
None of the above is affected by your choice of RDBMS.
For other options, consider influxDB with Grafana, especially if you want more powerful visualisations.
First off let me take the time to thank you for replying, however, I am not sure it is helpful.
You see, I already took a look at that, and that is what is confusing me.
It says its and internal integration, and when I go there I am not sure I understand that. When I try and add the integration, I can’t find it under “Add integration”
What is RDBMS?
So maybe I should have been clearer… I need a tutorial that assumes all I have done is installed home assistant, and “This is how to get longer retention”
If you can see the history of any entity, you have recorder running. It is enabled by default if you haven’t messed with default_config:
If you don’t know what that is you probably haven’t.
It tells you right in those recorder docs that if you want to change the default values for recorder (and apparently you do) you need to make a change to your configuration.yaml file. In your case you’re going to want to add something like.
recorder:
purge_keep_days: 15
The default retention time is 10 days.
The reason why WallyR pointed you to the community guide about database size is it can get out of control really quickly if you have many entities or entities that change state a lot so if you increase the retention value it compounds that problem.
That’s where long term statistics come in. For many sensors the history is stored in long term statistics after the default retention time has passed. It should happen automatically. If LTS is available for an entity you will see the data in the history graph.
So the first question really is what data are you trying to keep for longer than the default 10 days and why?
Okay, I think I am beginning to understand now. I hate editing yaml files manually, I have always been a GUI guy.
SO from quick scanning of the new information I have I want to be sure I understand correctly.
I can set the keeps days to as much as I want, but can’t set it for specific devices (Keep the kitchen light switch data for a week, but keep the bathroom light switch for a month)
The work around for this is a service to purge the data for specific devices (This seems like a bad implementation to me. Needing to manually list 150 things you want to purge regularly… And forgetting to add things when you add new devices) or I am misunderstanding something?
Secondly:
I am seeing these example yaml snippets on how to use “recorder.purge_entities” but no indication what to do with them (Again, I am a GUI guy…)
You’re overcomplicating things right now. Take it step by step. Answer this question and I’ll try and point you in the right direction.
But to answer the questions you have asked…
I can set the keeps days to as much as I want.
Theoretically, you really shouldn’t have to keep much more than the default though except in very specific use cases. You are bound to run into performance problems if you bloat the database. This is multitudes more so if you are running on a Pi with an SD card. But yes, you can set purge_keep_days
to whatever you want within reason.
but can’t set it for specific devices
This is correct.
The work around for this is a service to purge the data for specific devices (This seems like a bad implementation to me.
That is a way and you are correct it is a bad implementation. You’re much better off to set it up your data retention correctly in the first place.
You make a good point. I should have answered your question, and I will do the best as I can without rambling on.
I have, more then once wanted to look back at the location history of a tracked device. I would be perfectly happy if this data was never deleted so I can look back on it 2 years later.
I have a couple of energy monitoring plugs that track “Daily usage” I would like to retain 45-60 days of that data.
There have been multiple times I have wanted to look back weeks to check the status of a motion sensor. (I wasn’t home, I found something out 3 weeks later, I want to check the data and see what was detected)
As of right now I can’t right think of anything else. I was just thinking of keeping everything forever, but understand with home assistant that for some reason isn’t possible… Although I don’t know databases well, I have worked with a few, and in my experience (outside of HA) the database can be Gigabytes in size and nothing slows down… I am incredibly confused as to why that is the case with HA, and why the databases get so big? honestly a light turning on and off every min, all day, every day should only take up a few KB uncompressed. Again, even if its 30, 50, or even 200 GB, why that slows anything down.
I am running on a VM on Proxmox. I think I have dedicated 8 GB of RAM, and a couple of CPU cores of a Ryzen 5600. So its not just a PI and SD card.
So, if you can’t set retention of specific devices for longer then others, and you want only some data kept for a long time, but not all; and the work around is bad implementation, what exactly do I do? Or is this just another of those “Home assistant doesn’t work that way” kind of things (I can’t count the number of times I have encountered one of those…)
Or is there another integration or addon that is more “Mass data storage” friendly to keep track of “everything all the time”
HA can with some sensors easily create gigabytes of data and it is true if you have the hardware resources then databases can be quite big with no issues, but then we are not talking about a Raspberry Pi and even the tiny PCs will be somewhat limited.
It is not uncommon to see a gigabyte of data per day with an unmanaged short term database.
Check out the influxdb add-on, a database explicitly designed for time-series data.
Home Assistant wasn’t designed to be an efficient data archival system, it was designed to be an efficient home automation platform ie to do things - right now.
Users who want what you seem to be looking for (save everything forever) are an extremely small subset of the user base. How many people really care if their front porch light was turned on for 18 minutes at 5:51 am on March 5th, 2019?
There are things that make sense to record long term to track trends (ie weather, climate data) and that’s where long term statistics play a role by storing the data in a more efficient manner. I have years of long term statistical weather data in my database but only 15 days of actual stored data.
So it’s not that HA can’t do it, you just need to do a little work setting it up. If permanent retention of all your data is what you’re looking for that Francis is pointing you in the right direction. Sounds like your hardware won’t be an issue.
FWIW, I actually started going down a similar route at one point myself but I quickly realized the extra work and maintenance wasn’t worth the benefit of having the data I was trying to retain. Mind you that was before long term statistics were a thing in HA, that was a game changer.
Anyway, you’ve got some good suggestions here to achieve what you’re looking to do.
My HA is running on an Odroid M1 with 8GB Ram and a 500GB NVMe drive installed. My recorder config is as follows:
recorder:
purge_keep_days: 180
exclude:
domains:
- device_tracker
- media_player
- uptime
- time_date
- worldclock
HA is using the default sqlite database which now has grown to 5GB.
I don’t see any performance problems and no space problems.
It really depends on your devices.
Like if you have a sensor that reports power draw in watts, then if the sensor reports with an accuracy of only one decimal, then it will probably quickly have made all the possible states in the database and further entries will just be links to those states.
Having the sensor report in with 3 decimals and your number of states will increase immensely.
Having a sensor that reports often and with states that are never the same will make it go nuts.