Data science with Home-assistant

Hi all, I am very interested in mining my home-assistant database to get answer questions such as ‘who is using the most energy at home’ and to get the optimal inputs into the probability based bayesian sensor. I’ve done some work prepping the home-assistant database data for analysis in python, and my intention is to publish (via pypi) a small library to assist in doing data science with the home-assistant database data. If anyone has ideas of analysis they think would be interesting to see, let me know. Likewise if anyone has already done data science with home-assistant I would be grateful to hear about it!


In the long term this is something I definitely want to do too. And in the even longer term we should probably get into Machine Learning as well. (Why write automations by yourself if the computer could do it?) :wink:

Just so you know: There’s a Jupyter tutorial here already:

A notebook by to get started:


I am also very interested in making the system learn from its own sensor data, and then maybe fine-tune automations or warn me about anomalies.
With regards to electricity consumption monitoring there is a little box available called Sense (US only) that tries to predict which appliance or light has been turned on or off. I’d love to have something similar in HA. The more smart switches you have already installed with known consumers and electricity monitoring built-in, the easier it might get to make predictions like that Sense box can do.

1 Like

This is exactly the kind of stuff that could be extremely powerful for HA and set it apart from other alternatives if it can be made simple and generic enough for the average user. I’m taking a machine learning class next semester so hopefully I can pitch in.

1 Like

I think the way to got would be to create a add-on with the following features:

  • Python installation with Pandas, NumP, SciPy, the usual suspects…
  • Jupyiter installation, running the server automatically, storing the notebooks on persistant storage.
  • Prepared notebooks with helpers to access the Home Assistant database directly (SQLite, home-assistant_v2.db in cour configuration directory) or alternatively a history recorder like MySQL or InfluxDB.
  • Some helpers to get you started with getting the data out of the records and into a dataframe. Maybe a little PiPy library like @robmarkcole suggested.

I’ve never written a add-on (Or a PyPi library for that mather) before, but I don’t judge it to be an extremely big task. Don’t know when I get around to trying it though.


One potential problem for typical is disc space: The are typically installed on smallish SD cards. If we would like to do data analysis, we have to have as much data as possible. I’m not a 100% sure, but I think the standard Home Assistant application is regularely purging historical data. I couldn’t find the default configuration settings of the recorder component.

1 Like

There is an open request for Jupyter notebooks which is on @frenck radar :+1:
I’m working on a library to format the db for analysis with pandas, but any help welcome!
Re final point, im guessing most people interested in this kind of analysis would either invest in a suitable SD card or use an external db and the recorder component. I’ve been looking into google cloud services and they have a very cheap cloud MySQL db option. I’m sure amazon etc offer similar too.
There is also already a MariaDB addon for Hassio too.

Oh, just saw this:

frenck moved this from Idea to In Progress in Add-on Research & Development 11 hours ago

(❇ Jupyter Lab Server · Issue #22 · hassio-addons/repository · GitHub)

1 Like

Glad to have found this thread.

There are definitely some low hanging fruits that can give us intelligent insights about our homes. I was trying to figure out when to turn on the garage lights and I was hoping to use a bayesian sensor for it. But, given that it only accepts state and numeric_state, I cannot use an and condition, unless I use an input_boolean.

1 Like

Update: gave the analysis its own repo, linked in the head post.

Getting some nice visualisations of sensor data. See below the correlation between two indoor temperatures (strong corr) and an outdoor sensor (weak corr). This indicates that my home is well insulated/heated :slight_smile: I have temperature sensors in every room, and it is interesting to see the subtle differences in temperature throughout my home. For example, my hall is consistently cooler than the living room, but since I don’t spend much time in the hallway I should remove that as an input to my home thermostat.

I’ve also plotted activity in the home (detected by PIR) with day and time of day (by category). Over this 20 day period I was much more active at home on Saturday than Sunday. Will continue this analysis with longer lookback period. Also interesting to see that my general level of morning/evening activity drops of as the week progresses!


I’ve been looking at machine learning as well.

I want my hass to learn when to switch on the lights in the living room depending on sensor data like presence and light levels. Without having to set a fixed light level threshold I’d like hass to learn from previous behaviour when to switch on or off.

For this I am thinking of using a decision tree or k-nearest-neighbour. But not sure about this. No idea whether bayesian sensor could be used for this too. I’ve also no clear idea yet how to dynamically load hass with learning data (I want it to keep learning from my behaviour and finally “take over”)

Anyone any ideas ?

You should be able to use bayesian sensor for this. Use the relevant sensors/entities to come up with a posterior probability.

I did have a quick look at it, but it made me think you need to put a “weight” to sensor values from where another probability is calculated.

I couldn’t really find the “machine learning” part here… But again, maybe I should study it closer.

Indeed…you need to put the weights there. So, it is not really “learning”, but it is a good start until we can figure out something better :slight_smile:

Could you do this with python? If so, I’d imagine you could probably do it with appdaemon… but it will be a lot of work!

Hi, yes you want a bayesian sensor, and the correct way to determine the weights would be analysis of historical data. A future approach might be to manually tag events (e.g in-bed) and then have a script to determine which inputs (sensors + weights) would have allowed detection of that event via a bayesian sensor. For further reading checkout this notebook.

Btw I broke out the code for the bayesian sensor and will start doing analysis to determine optimum weights etc, you can find it in my repo.

Clearly, it helps to use data to inform the weights, but in my experience starting with best guesses and fine-tuning over the next couple of days worked well.

I’ve started writing up a tutorial on the bayesian sensor. Please give me any feedback on what it would be nice to include? Cheers


OK getting some time-series forecasting going with HA weather data and prophet, more to come! Should allow predictions with seasonality built in.


Hi all. Just two examples of my achievements.
How much time is my boiler switched on?
Data collected during 45 days where the number of hours (that the boiler is ON) are plotted against the average outside temperature filtered by the day of the week (weekday: from Monday to Friday; weekend: Saturday and Sunday). The thermostat (powered by HA and appdaemon - kudos to the devs!) has different programs depending on the day of the week. A simple linear regression is also performed.

  • Even in the coldest days, the accumulated switched ON time is lower than the programming time to achieve the 21.5ºC inside temperature.
  • The colder outside, the longer the boiler is switched ON (pretty obvious).
  • The boiler is switched ON more time on weekends than on weekdays (up to one hour more when it’s coldest)

Children procrastination ranking
All the calendar maps below share the same color scale so comparison is straight-forward. They represent the daily accumulated time the tablets are switched ON.



  • The owner of Tablet 3 deserves a chocolate cake.

I’m happy to share the code if anyone is interested. If only I knew how to properly work with github…
Thanks to @robmarkcole for creating this thread!


Nice! @timseebeck yes please upload your working to Github, or allternativly check out kyso like the following: