Keep all raw data

Just

recorder:
  auto_purge: false
2 Likes

Good Point! :smiley:

You have posted 4 times on this forum and each time with the same message. Boring.

2 Likes

Not a good point. :smiley:

Well your request has been answered by @koying so you got what you asked for. Thanks for your contributions.

2 Likes

So post hoc after a year of using HA i first noticed that all of my data older than 10 days had been purged by default. That came unexpected like “OH, my phone deletes contacts i haven’t called in 76 days to help me save memory. Yes ofc you can deactivate that “feature” - why didn’t you?” Then i googled the (likely old) misinformation that the time to keep the data can be increased but not set to “never” for performance reasons. Has been changed as it seems. So, misguided, i read through LTS, and STS, and influxDB and accessing those data sources from places where i wanted to work with the data as a workaround for what should be the absolute standard. If you are developing you got used to it and maybe don’t think about it anymore, but from a new user perspective i think nobody expects that data is purged after some time by default. I mean if my system has software that deletes my files older than 10 days without asking i would consider that a virus, no? If performance is the issue, work on that instead of trying to cure the symptoms imo. I think it should be considered if it is more harmful to new users, that they run out of diskspace and that the plots have sluggish performance or if unexpected and unannounced data loss is more of a problem? The first case is imo what everybody has learned to exspect.

A new HA user doesn’t care about historical data. Home Assistant is about home automation, not about keeping mostly irrelevant data forever.

Anyway, that “issue” has been addressed by implementing aggregated “long-term statistics”. Even influxdb implement retention policies and aggregation strategies to avoid using petabytes of disk space for no practical purposes.

You want to keep detailed data forever? Fair enough, you have it.
Don’t expect that people want their fragile RPI sdcard filled up with nonsensical data by default…

4 Likes

wow. other oppinions?

Who cares that the toilet light at 02:33:45 on 03 August 2001 was on for 23 minutes?
Would you really be interested in that ?
:thinking:

3 Likes

Every system that saves data needs a retention policy by default, to clean up after itself. Left unchecked, the system storage will fill up, causing degradation in performance, and inevitably, well, no-worky. So an opt-in (to disable purge) is a much better default setting, imo.

Myself, I’m in the camp of not needing most data that is older than a few weeks, so the auto purge works perfectly. For about 20 other more important measurements, they go to influxDB.

4 Likes

You’ll usually find me advocating for easier ways to exclude or delete data here, so perhaps mine isn’t the opinion you want.

That said, we may be on the same page about one thing: the need to separate short-term “live” data from long-term “archival” data. These data shouldn’t be in the same database, IMHO.

I agree with those who pointed out that the vast majority of HA data are irrelevant after a day. Others you may want to keep for a week, a month, a season or a year. Hence my earlier FR about setting retention times by entity, instead of HA’s one-size-fits-all approach.

I’ve found very few things I’d like to keep longer than a few days. And most of those are covered by HA’s long-term statistics. For those few I want to keep forever, I have some automations which post them to comma-delimited text files. I occasionally copy these off the HA drive and onto my NAS. From there I can use other tools to do any kind of analysis I need.

i was in the first place offended by my data getting deleted without asking me. i think thats understandable.

i think automation and data go hand in hand. the data aspect is fundamental. retention policy is necesssary as well. keeping all channels at raw is ofc also not optimal. deleteing data by default without announcement is at least unexspected and very uncommon. data should not be considered “mostly irrelevant” from the get go of the code decided by the programmer. yes i would consider the toilet light example irrelevant, but i would feel uncompfortable with writing code that deletes users data based on my assumtions. long term statistics are not the default (thus default delete). enforced downsampling is deleting as well.

maybe the topic just needs better visibility so that you dont fall into that trap. The focus is on automation and nice displays, but this is tightly connected to data management. maybe its even such an important aspect, that some data management interfacce would be desireable. retention policy, downsample rate (if any), ability to just modify data, ability to display data in different retention states simultaniously, ability to do all that per channel. decide yourself what needs deleting and ability to edit and see that.

ive started using this for industrial plant logging - nothing critical exploded yet, just reading :). maybe i exspected to much. the broadness of the communication protocols you can interface with was the main interesting aspect for me. nice displays are cool, but it breaks with the way data is handled imo. at least its an unexspected hassle.

well. lets see how it develops. thanks for your time.

I bet that there are more people offended by the fact that why their data was recorded :thinking:

1 Like

I just deleted my DB because I couldn’t think of a single thing to use my 3 YO data for, and the 700mb of wasted space was bothering me… So deleted everything homeassistant except ,yaml…

And I’m feeling better already…

Yes saving the long term garbage does bother me.

I agree on that one as well.
Having both important (long term) and disposable (detailed) data in the same DB is a (minor) problem.

1 Like

i guess another aspect is if the devs intent to make this into something professionally/small scale industrial useable or if they want this to stay in the consumer market guy fiddles around with stuff market. the trueNAS guys i.e. tried to bring theire idea more to the professional level apparently with success. isn’t the nabucasa company hosting cloud services also part of this team and already trying to monetarize/professionalize the idea?

i mean i see real potential. the number of protocol interfaces is great. check the price for mqtt components for industrial automation. compare that to the industrial automatio software. thats dry as hell and was invented in the 90s. its not fun to play with at all, though rugged.

I don’t think the goal for the HA dev team is the large-scale industrial market. For one thing, that stuff has to be bulletproof reliable. For another, HA - and the whole home automation space - is moving too quickly to support the stable, long-term, low-support requirements of most industrial settings. I’m doing this as a hobbyist. I’d never pay someone to tweak and tinker at the level I’ve had to.

On the flip side, I could see HA in a “cottage industry” setting like a micro-brewery.

Anyway, there are existing options for saving select long-term HA data, and in a much better format than what the HA Recorder database offers.

1 Like

I guess you’re intentionally ignoring the fact you’re using home automation software for industrial purposes.

Despite this, you’ve been given solutions to your issue, including using other (more industrially suited) databases.

Yet, you keep harping on about how this is “offensive” and comparing the software you didn’t bother researching properly before installing to a “virus”.

Meanwhile, there are currently 0 votes for this request, which means you didn’t even bother to vote for this yourself.

2 Likes

Another documentations non reader.

1 Like