Open source development isnât interchangeable â people work on what theyâre interested in. I get that youâre joking, but⊠if people want this better, the best way to do it is to start actually making it better.
Thatâs easier said than done. The recorder is a very central part of HA and has inter-dependencies into several other parts. Changing it would be a very significant amount of work and requires indepth knowledge of the HA internals. Add to this a very âclosed-inner-circleâ BDFL approach of the core developers, where it is extremely difficult to get anything approved into core that would change a fundamental part of HA as an outsider, regardless of how good it is technically. There are so many hoops they make you jump through, and thereâs no guarantee they wonât just throw it out at the end and you wasted all your time.
Realistically, something like a completely new recorder component would be impossible to get approved into core unless one of the main devs does it.
If any of that was true we wouldnât have:
- Auth
- Automation Choose
- Automation Modes (Parallel, Restart, etc)
- Revamped Template Engine
- Scenes apply/recall
- Lovelace
I could go on. Most of home assistant is built by volunteers. Many of the major features were built by people outside the 10 developers and outside the additional group of 50 developers that contribute a significant amount.
Being a Member on GitHub for home assistant grants you additional abilities on GitHub, thatâs it. This whole inner circle crap was drummed up by conspiracy theorists who were upset that their PRs were closed when they didnât follow the contribution rules. And as you can see by the examples above, you donât even need to be a member in order to make a significant impact on Home Assistant.
Pretty much sums it up in a nutshell.
Are you using the âofficialâ ADD-ON , or an âexternalâ MariaDB ?
Personally I recommend external on a naz.
Thx for your answer, âexternalâ was/is my intention eventually, but as my HA is quite new (about 4 month) , i have spend time learning, so i have no immediate needs for a âswitch-overâ, i first did a change to Influx, to get Grafana, after that i realized i had to âfiddle a littleâ with both retention/purge and recorder ( 1 of the reasons id like to go for MariaDB, as there are actually only few data-sets i find reason to keep for longer periods) , âEnergy Dashboardâ seems to âsolveâ 1 of them , and with 2 mSATA on same âBoardâ as HA, i doubt that i will notice a different in performance
You might notice slight performance boosts depending on the speed of your naz. Originally for me, it was a huge boost because I was running on a PI3 and moved it to a naz.
I Donât agree as ârecorder componentâ should do what itâs build for Record!, and in my opinion Everything, and you have the options to choose whatever you want. Donât involve recorder, in this topic, but cause it do what You want, You just have to configure cording to your preferences, which of-cause lead to your other âconcernâ History , true with 1-3 month storage of ALL records, it will be slow, specially if you want to see a period of 2weeks/1 month , You can easily, do that for 1 entities ( In the History View ).
Currently, if you want to see i.e 1 month for only 6 entities, there is no options âyetâ for that, so that would be a âFeature-Requestâ ) ⊠or You could âfixâ that with a âmark-up/htmlâ Card âŠ
( And tell me which DB( architecture ) wouldnât be slow, if you want to âPullâ all records for 1-3 month in a view( thou that would be the browserâs concern ) , but in 1 âclickâ ?
But as you seems âcomfortableâ with squlite /db_in_common / JSON etc., i do believe you then also are âcomfortableâ with .JS ⊠then instead of configure your own card( as above ) You could offer you expertise ti fix this Feature-Request ( there are several feature-request in regards to âhistory-viewâ, im sure you even could âcombineâ some of those, and by that please a-lot of people
EDIT2: As it sounds like you use âofficialâ Homeassistant OS in VB/VM, have you tried to âcopyâ your DB-file, when HA is shutdown ? , i.e if in windows( I know itâs like pretty often) that microsoft whatâs you to reboot system, and i know itâs also required sometimes in any linux-system, regardless, you canât âbypassâ this, and every-time you âapplyâ major Updates, it seems to me that you also have to ârestartâ HA, try to âShutdownâ HA, copy your DB, Start-up ⊠im sure the Update wont get âcorruptedâ by that, and if you donât have-to or want-to restart your HA, during time between major updates(or other, where a restart is required), you just wait until you find it appropriate or convenient, and follow above simple copy-db-file, when system is down ( PS: No treads or transactions running )
I donât wanna put a burden on you, itâs just a tip, to make life more flawless, and comfortable
I use the official one.
Absolutely and fully understood. Honestly Iâd love to make a âpull requestâ with a new database structure and recorder that I have contributed but I really know nothing of Python and sadly just donât have the time to contribute is this way which would involve learning another programming language.
Someone with this knowledge will have to do this in the future though because it is obvious from the constant database/recorder problems people experience that the current setup is completely unsustainable.
The issue is the way in which the recorder component uses the database. It is the most inefficient, space using and slowest way to save information to a database possible. Native data types are not used, large amounts of data are stored as strings in TEXT fields. There is simply no way to fix this other than redesign.
If one limits the amount of data stored either by filtering with the recorder component or simply setting purge_keep_days to a week or two, then it is somewhat manageable but severely limits functionality. And this is an option which must be explicitly set. Because if someone installs HA, then adds a sensor or two which post data every few seconds, in a few months HA is no longer running because the database has grown to 30GB and filled up all available drive space. When really, this data should be using 30MB.
I run HA Core in a venv on Ubuntu Server w/MySQL as the recorder DB. Honestly I donât even bother trying to keep the database between (infrequent, maybe a few times a year) update. I just drop the database and let HA recreate it.
ok, i understand your point, sqlite was new to me when starting with HA, thou i had working experience with Oracle, and MySql in private, so yes i was a-bit âpuzzledâ of the âlimitationsâ in this âHA-Componentâ, but after few days saw the âalternativesâ MariaDB, or what-ever, quickly changed to Influx( but due to Grafana ) ⊠and yes, the limitations do causes irritations, thou stated in documentation i got the impression that keeping the default Purge-settings , and with a âlimited/minimumâ installation/integrations/use, it would be sufficient as is( for most average user) ( raising default disk-size to 50, on existing install-media(images) and Bonk into people to READ THE DOCUMENTATION and Update-Notes, would(âŠcould âŠ) ease the âemmidiateâ pain for alot of people, but eventually the same people will end up in âpoint of no returnâ anyway, Disk Full ⊠it has been a âproblemâ since before disk-storage was âinventedâ âŠ
⊠itâs kind of like the same in many daily life situations ( that people have to learn to deal with ) , like you fill up your clothes, your fridge ( it does have itâs capacity-limit ) you can buy a new ( bigger ) or you can start consuming, but then you will see ( You have to fill it again ) no matter what ( and in good time, so you donât run out of milk or beers ) and if you buy to much Groceries, it wonât fit in your fridge.
⊠People have learned to âliveâ with many routines through 1000 of years, now We just have to learn to live with that Phones do have a âstorageâ limit, computers the same, every âmodernâ and old storages likewise ⊠and yet, like after 30 years of this ânewâ Add-On in peoples life, people still think that the Milk-Man comes daily, and place full-bottles on the door step, or that "someone throws out âoutdatedâ milk and food.
And the âsolutionsâ in HA are mentioned, everyone knows ( Or they can only blame them self ) , Many solve it various ways, like you mention your self, i dropped 1 or 2, before switching to Influx and fiddled with the âsettingsâ in Influx/configuration.yaml, keeping the DB-size fairly stable, and recently increased the default-disk-size to 60GB ( I know some seems to have problems with this(But with WMware and the HAâs standard Images ability to automatically increase to the new available Size, no problems ) but are eventually installing a MariaDB, as im quite annoyed annoyed over the âpoorâ overview and options i can get in sqlite, like setting purge on individual tables, and easily adding âexternalâ sources( in new tables to use in templates/html-Cards )
Also , i would llike to see your âcalculationâ you base you Statement upon , and your statement that ( Large amount of data stored as strings in TEXT fields ) , should be a cause.
I donât know how many âentriesâ you have in your DB-File, but a quick view in sql-web, âeventsâ and âstatesâ , would fairly âSummariesâ the amount of âColumnsâ needed , for each âRowâ , and amount of entries of-cause also depends on amount of devices/entities/add-ons etc. etc.
But i am a-bit puzzled about how you got those ânumbersâ ( 30 GB should be 30MB ) ⊠it simple doesnât doesnât seems to be concrete, reliable information you are trying to spread here
Itâs not really a calculation, but born from experience.
I created an energy sensor that takes readings every 2 seconds of about 12 items.
Published it to HA via MQTT.
Within 30 days, my database had increased beyond 10GB in size and HA was suffering. History was unusable, cameras didnât how up due to all the DB thrashing happening in the background, and the energy data I was storing was pointless as I couldnât do anything with it.
I rewrote the (Arduino) code on the sensor to instead post the values directly to a MySQL DB using native data types. Exactly the same data, after about 30 days, too about 200MB to store. I used SQL sensors to bring the data into HA creating queries to create a 10 second average in SQL.
Can read a little about it here:
https://community.home-assistant.io/t/how-to-save-sensor-data-to-mysql-database/163094/13
ok, i interpret this as, the data you gained from that sensor during 30 days is about 200MB, as you say ⊠so One could wonder what happened during these first 30 days, when you had it in homeassistant.db ⊠i donât know which âperiodâ we are talking about, but i do know that within past 3 month, âSomeâ serious major âchange-eventsâ have occurred AND announced at several places, not alone in Update-notes ! ⊠beside these âChangesâ, there are the additional CORE updates ( Bunch of Feature-Requests from Users !" included, etc . etc. beside you did change default purge to more than 14 days, and You maybe added an extra âFEWâ entities Your self
Where is this âeventâ stored ?
where is this âeventâ stored ?
⊠and where is this event stored ?
Most likely the recorder was designed with the intent of moving to a NOSQL database at some point, or to one of the growing number of SQL databases that support JSON as a native format.
This is not a reason for a complete recorder rewrite- it is wat allows the recorder to be flexible in what it stores. To badly translate a Dutch proverb: A relational SQL database is about as flexible as a lead door. This is why NOSQL databases are gaining so much traction.
Edit:
Geeesus, why you write this, now i just got another ânewâ Option to look into
I donât know the reasons, but itâs obvious that there are quite a few that seems to have âmissedâ that the Recorder Now Records ALL, so people have to make their choices, depending on their needs/installation etc. Flexibility a High level ⊠Number 1 Choice i went for HomeAssistant
To be fair, they do use some native datatypes in the table. I think this was done maybe a year or two ago ? I remember at some point they had to update the db format and migrate the db (which resulted in some cryptic error on my setup and the db being corrupted, but thatâs another story). Right now, the states table format looks like this:
Lots of varchars (some of which are unavoidable to some extent), a hideous TEXT (see below) and lots of redundant data being stored. Every single state change stores the full entity ID (as text), full domain name (as text too) and, worst of all, a full json dump of all the attributes:
A full json text dump for every single state change of every sensor.
The amount of redundancy in that db is insane. Not only will this bloat the db size, it also makes it slow to load. Not so much on the SQL query side, but all that redundant data needs to be parsed back into Python objects. And parsing is slow, and Python is slow. The problem here is, as you said, thereâs no easy fix for this besides a full redesign of the recorder system from the ground up. State changes need to identify their sources over short index IDs or GUIDs that link back into another table for the entities and attributes to cut down on both parsing overhead and redundancy.
I just did a simple check on a random state table entry on my db for a temperature sensor. A single state, as it is, uses 294 bytes in the db for a single temperature value. The exact size varies by sensor and even state, due to liberal use of variable length cells. If you remove most redundant overhead, keep the states and event IDs, a timestamp, an additional ID to link to the source entity and attribute table and finally, a IEEE float for the value, that data can be compressed into 24 bytes. Thatâs 8% of the original size. And technically this could be made even smaller.