I frequently find myself thinking “if only I had the data for this sensor over the last few months”. I know I can increase the HA DB’s retention period, but I’m worried this is not really what the SQLite DB is for, and other options like sending the data to InfluxDB, or installing a separate instance of MariaDB or other DB server for this seems a bit of overkill.
Being heavily involved with Snowflake at work, I know the power of purpose-built analytical databases (columnar, well partitioned etc), and wanted to try the using DuckDB to hold my data. My thought process is that if HA can do without a DB server for transactional data, why should I need a DB server for my single-or-very-few-users analytical needs?
So I built this small script and systemd unit/timer, and will start scraping my HA db every hour into a DuckDB file in my host server. I simply copy the HD SQLite DB to the host via ssh to avoid impacting the HA instance, then do light processing to save the data in a easy-to-use format
It’s very simple, and doesn’t yet allow for filtering entities, nor does it automatically manage data retention (I’ll have to trim it from time to time), but I think it will do the trick for a few months.
Does anyone see any obvious downside or issue with this approach?
Any ideas on what fun things you could do with this type of rich and granular data collected over a few months?