This is awesome. A ‘best practices’ guide for medium and long-term data storage would a vital companion to these efforts and would open the doors for full usage of the work that’s being done here.
I’d really appreciate any advice that I think will serve other new users like me for recommendations as how to obtain a long-term data storage solution…there are a lot of scattered opinions in the forums here, but I don’t think there’s a general recommendation with a guided setup (hardware, off-loading DB onto NAS, ha, choice of DB format/type, etc)
(my own particular trial-and-error-based experience so far below)
Thanks for putting the work and time into this!
–
E.g. A new user’s experience that I’d like to avoid repeating: 3 months / 1.7gb sqlite3 data ‘loss’ and possible db ‘corruption’.
To my dismay last night, while I was trying to figure out why my sensors section of my configuration.yaml file wasn’t working with the mosquitto broker add-on that was communicating with a esp32 running micropython, my setup either froze or crashed.
On forced reboot, my home-assistant_v2.db was protesting with ‘DB image malformed’ errors in the home-assistant.log (fixes used by others In the HA forum wouldn’t work b/c DB integrity errors voiced by sqlite3 regarding instances of non-unique event ids around the chronological middle of the database’s entries).
So I had to delete the home-assistant_v2.db DB (made a sep. copy) to make Hassio function again. I can analyze the copy separately, but obviously I lose the ability to use the data within Hassio itself and some of the capabilities provided by the work being done here.
Avoiding this scenario particularly applies to ML-based efforts that I’d like to work on in the future:
- Predict user-desired automations and I’d like to try and create a rudimentary recommender system and anomaly detection system:
(like ‘hey Dude, it’s late for you even for a Friday night, and I think you’re likely to fall asleep soon since there’s less and less motion detected and due to your general sleep habits…did you forget that you left appliance X on like a dummy? Can I turn that off now for you?’)
I thought I had things sufficiently covered power and storage format-wise (RPi4 & 1TB HDD Pi Drive–sd card is used for boot only ‘old school-style’–both separately powered with sufficiently high amperages & Hass installed in docker)…but perhaps not?
Automation is irresistibly intriguing. Even more intriguing is the idea of privately collecting one’s own ‘organic’ data, which makes for far more interesting datasets to use in my own data science education than standard textbook generated/simulated ones.