What is HA about? Collect and act based on data. Everything to be done in HA is based on data. Data is fundamental.
However, currently the “recorder” does not store data indefinitely. By default and without the option to do so at all. Raw data should be stored unaltered indefinitely.
There is a 2nd “database” or way to store data implemented in the form of “short and long term statistics” which is not enabled by default for each sensor and can not be enabled for some types of data (i.e. binary sensor) and stores a down sampled version of the data and does not store the original value of the data but multiple (arbitrary) features of it instead.
Raw data should be stored stored indefinitely (HERESY! The mechanicum deletes nothing! Especially and foremost not the source of all things, the raw data). Raw data should be stored in the same format or database. All types or unit or any other arbitrary meta data attached to the data should not decide if the data can be stored. Its additional information and needs to keep the raw unchanged. Instead of storing feature values calculated from the data (min, mean, max) the raw data and the method to extract the features should be stored (if conserving memory is the goal). You do the same thing for code. Code is not data and data is not code. Handle those separately. If calculating the feature is computationally to intensive for the use case storing feature values should be considered (tradeoff increasing the memory load but reducing the CPU load). The data should be stored raw (not altering temporal or value resolution by down sampling, averaging or banana-shaped sliding cross correlation windows - wth?). R A W is the holy grail of data. The center. The way this is currently handled by HA is unprofessional.
So post hoc after a year of using HA i first noticed that all of my data older than 10 days had been purged by default. That came unexpected like “OH, my phone deletes contacts i haven’t called in 76 days to help me save memory. Yes ofc you can deactivate that “feature” - why didn’t you?” Then i googled the (likely old) misinformation that the time to keep the data can be increased but not set to “never” for performance reasons. Has been changed as it seems. So, misguided, i read through LTS, and STS, and influxDB and accessing those data sources from places where i wanted to work with the data as a workaround for what should be the absolute standard. If you are developing you got used to it and maybe don’t think about it anymore, but from a new user perspective i think nobody expects that data is purged after some time by default. I mean if my system has software that deletes my files older than 10 days without asking i would consider that a virus, no? If performance is the issue, work on that instead of trying to cure the symptoms imo. I think it should be considered if it is more harmful to new users, that they run out of diskspace and that the plots have sluggish performance or if unexpected and unannounced data loss is more of a problem? The first case is imo what everybody has learned to exspect.
A new HA user doesn’t care about historical data. Home Assistant is about home automation, not about keeping mostly irrelevant data forever.
Anyway, that “issue” has been addressed by implementing aggregated “long-term statistics”. Even influxdb implement retention policies and aggregation strategies to avoid using petabytes of disk space for no practical purposes.
You want to keep detailed data forever? Fair enough, you have it.
Don’t expect that people want their fragile RPI sdcard filled up with nonsensical data by default…
Every system that saves data needs a retention policy by default, to clean up after itself. Left unchecked, the system storage will fill up, causing degradation in performance, and inevitably, well, no-worky. So an opt-in (to disable purge) is a much better default setting, imo.
Myself, I’m in the camp of not needing most data that is older than a few weeks, so the auto purge works perfectly. For about 20 other more important measurements, they go to influxDB.
You’ll usually find me advocating for easier ways to exclude or delete data here, so perhaps mine isn’t the opinion you want.
That said, we may be on the same page about one thing: the need to separate short-term “live” data from long-term “archival” data. These data shouldn’t be in the same database, IMHO.
I agree with those who pointed out that the vast majority of HA data are irrelevant after a day. Others you may want to keep for a week, a month, a season or a year. Hence my earlier FR about setting retention times by entity, instead of HA’s one-size-fits-all approach.
I’ve found very few things I’d like to keep longer than a few days. And most of those are covered by HA’s long-term statistics. For those few I want to keep forever, I have some automations which post them to comma-delimited text files. I occasionally copy these off the HA drive and onto my NAS. From there I can use other tools to do any kind of analysis I need.
i was in the first place offended by my data getting deleted without asking me. i think thats understandable.
i think automation and data go hand in hand. the data aspect is fundamental. retention policy is necesssary as well. keeping all channels at raw is ofc also not optimal. deleteing data by default without announcement is at least unexspected and very uncommon. data should not be considered “mostly irrelevant” from the get go of the code decided by the programmer. yes i would consider the toilet light example irrelevant, but i would feel uncompfortable with writing code that deletes users data based on my assumtions. long term statistics are not the default (thus default delete). enforced downsampling is deleting as well.
maybe the topic just needs better visibility so that you dont fall into that trap. The focus is on automation and nice displays, but this is tightly connected to data management. maybe its even such an important aspect, that some data management interfacce would be desireable. retention policy, downsample rate (if any), ability to just modify data, ability to display data in different retention states simultaniously, ability to do all that per channel. decide yourself what needs deleting and ability to edit and see that.
ive started using this for industrial plant logging - nothing critical exploded yet, just reading :). maybe i exspected to much. the broadness of the communication protocols you can interface with was the main interesting aspect for me. nice displays are cool, but it breaks with the way data is handled imo. at least its an unexspected hassle.
well. lets see how it develops. thanks for your time.
I just deleted my DB because I couldn’t think of a single thing to use my 3 YO data for, and the 700mb of wasted space was bothering me… So deleted everything homeassistant except ,yaml…
i guess another aspect is if the devs intent to make this into something professionally/small scale industrial useable or if they want this to stay in the consumer market guy fiddles around with stuff market. the trueNAS guys i.e. tried to bring theire idea more to the professional level apparently with success. isn’t the nabucasa company hosting cloud services also part of this team and already trying to monetarize/professionalize the idea?
i mean i see real potential. the number of protocol interfaces is great. check the price for mqtt components for industrial automation. compare that to the industrial automatio software. thats dry as hell and was invented in the 90s. its not fun to play with at all, though rugged.
I don’t think the goal for the HA dev team is the large-scale industrial market. For one thing, that stuff has to be bulletproof reliable. For another, HA - and the whole home automation space - is moving too quickly to support the stable, long-term, low-support requirements of most industrial settings. I’m doing this as a hobbyist. I’d never pay someone to tweak and tinker at the level I’ve had to.
On the flip side, I could see HA in a “cottage industry” setting like a micro-brewery.
Anyway, there are existing options for saving select long-term HA data, and in a much better format than what the HA Recorder database offers.
I guess you’re intentionally ignoring the fact you’re using home automation software for industrial purposes.
Despite this, you’ve been given solutions to your issue, including using other (more industrially suited) databases.
Yet, you keep harping on about how this is “offensive” and comparing the software you didn’t bother researching properly before installing to a “virus”.
Meanwhile, there are currently 0 votes for this request, which means you didn’t even bother to vote for this yourself.