This is exactly the kind of stuff that could be extremely powerful for HA and set it apart from other alternatives if it can be made simple and generic enough for the average user. I’m taking a machine learning class next semester so hopefully I can pitch in.
I think the way to got would be to create a Hass.io add-on with the following features:
- Python installation with Pandas, NumP, SciPy, the usual suspects…
- Jupyiter installation, running the server automatically, storing the notebooks on persistant storage.
- Prepared notebooks with helpers to access the Home Assistant database directly (SQLite,
home-assistant_v2.db
in cour configuration directory) or alternatively a history recorder like MySQL or InfluxDB. - Some helpers to get you started with getting the data out of the records and into a dataframe. Maybe a little PiPy library like @robmarkcole suggested.
I’ve never written a Hass.io add-on (Or a PyPi library for that mather) before, but I don’t judge it to be an extremely big task. Don’t know when I get around to trying it though.
.
One potential problem for typical Hass.io-installations is disc space: The are typically installed on smallish SD cards. If we would like to do data analysis, we have to have as much data as possible. I’m not a 100% sure, but I think the standard Home Assistant application is regularely purging historical data. I couldn’t find the default configuration settings of the recorder component.
There is an open request for Jupyter notebooks which is on @frenck radar
I’m working on a library to format the db for analysis with pandas, but any help welcome!
Re final point, im guessing most people interested in this kind of analysis would either invest in a suitable SD card or use an external db and the recorder component. I’ve been looking into google cloud services and they have a very cheap cloud MySQL db option. I’m sure amazon etc offer similar too.
There is also already a MariaDB addon for Hassio too.
Cheers
Oh, just saw this:
frenck moved this from Idea to In Progress in Add-on Research & Development 11 hours ago
(❇ Jupyter Lab Server · Issue #22 · hassio-addons/repository · GitHub)
Glad to have found this thread.
There are definitely some low hanging fruits that can give us intelligent insights about our homes. I was trying to figure out when to turn on the garage lights and I was hoping to use a bayesian sensor for it. But, given that it only accepts state
and numeric_state
, I cannot use an and
condition, unless I use an input_boolean
.
Update: gave the analysis its own repo, linked in the head post.
Getting some nice visualisations of sensor data. See below the correlation between two indoor temperatures (strong corr) and an outdoor sensor (weak corr). This indicates that my home is well insulated/heated I have temperature sensors in every room, and it is interesting to see the subtle differences in temperature throughout my home. For example, my hall is consistently cooler than the living room, but since I don’t spend much time in the hallway I should remove that as an input to my home thermostat.
I’ve also plotted activity in the home (detected by PIR) with day and time of day (by category). Over this 20 day period I was much more active at home on Saturday than Sunday. Will continue this analysis with longer lookback period. Also interesting to see that my general level of morning/evening activity drops of as the week progresses!
I’ve been looking at machine learning as well.
I want my hass to learn when to switch on the lights in the living room depending on sensor data like presence and light levels. Without having to set a fixed light level threshold I’d like hass to learn from previous behaviour when to switch on or off.
For this I am thinking of using a decision tree or k-nearest-neighbour. But not sure about this. No idea whether bayesian sensor could be used for this too. I’ve also no clear idea yet how to dynamically load hass with learning data (I want it to keep learning from my behaviour and finally “take over”)
Anyone any ideas ?
You should be able to use bayesian sensor for this. Use the relevant sensors/entities to come up with a posterior probability.
I did have a quick look at it, but it made me think you need to put a “weight” to sensor values from where another probability is calculated.
I couldn’t really find the “machine learning” part here… But again, maybe I should study it closer.
Indeed…you need to put the weights there. So, it is not really “learning”, but it is a good start until we can figure out something better
Could you do this with python? If so, I’d imagine you could probably do it with appdaemon… but it will be a lot of work!
Hi, yes you want a bayesian sensor, and the correct way to determine the weights would be analysis of historical data. A future approach might be to manually tag events (e.g in-bed) and then have a script to determine which inputs (sensors + weights) would have allowed detection of that event via a bayesian sensor. For further reading checkout this notebook.
Btw I broke out the code for the bayesian sensor and will start doing analysis to determine optimum weights etc, you can find it in my repo.
Cheers
Clearly, it helps to use data to inform the weights, but in my experience starting with best guesses and fine-tuning over the next couple of days worked well.
I’ve started writing up a tutorial on the bayesian sensor. Please give me any feedback on what it would be nice to include? Cheers
OK getting some time-series forecasting going with HA weather data and prophet, more to come! Should allow predictions with seasonality built in.
Cheers
Hi all. Just two examples of my achievements.
How much time is my boiler switched on?
Data collected during 45 days where the number of hours (that the boiler is ON) are plotted against the average outside temperature filtered by the day of the week (weekday: from Monday to Friday; weekend: Saturday and Sunday). The thermostat (powered by HA and appdaemon - kudos to the devs!) has different programs depending on the day of the week. A simple linear regression is also performed.
Conclusions:
- Even in the coldest days, the accumulated switched ON time is lower than the programming time to achieve the 21.5ºC inside temperature.
- The colder outside, the longer the boiler is switched ON (pretty obvious).
- The boiler is switched ON more time on weekends than on weekdays (up to one hour more when it’s coldest)
Children procrastination ranking
All the calendar maps below share the same color scale so comparison is straight-forward. They represent the daily accumulated time the tablets are switched ON.
Conclusions:
- The owner of Tablet 3 deserves a chocolate cake.
I’m happy to share the code if anyone is interested. If only I knew how to properly work with github…
Thanks to @robmarkcole for creating this thread!
Nice! @timseebeck yes please upload your working to Github, or allternativly check out kyso like the following:
@robmarkcole , sorry for the long silence. I’ve been learning this github thing (add, commit, push, and things like that). (I think) I succeeded
The jupyter notebook is here.
@robmarkcole Yes, I save the values of some selected sensors every 15 min in a text file located in a USB memory attached to my raspi (to avoid SD card wear). Then, I can perform long term analysis (months) without having an extremely huge database file. I also created an automation to upload the text file to Dropbox using this.
However, I’m open to hear other workflows since I’m a complete noob in these topics.