Data science with Home-assistant

@robmarkcole , sorry for the long silence. I’ve been learning this github thing (add, commit, push, and things like that). (I think) I succeeded :slight_smile:
The jupyter notebook is here.

1 Like

@timseebeck Are you exporting you db to text file on dropbox?

@robmarkcole Yes, I save the values of some selected sensors every 15 min in a text file located in a USB memory attached to my raspi (to avoid SD card wear). Then, I can perform long term analysis (months) without having an extremely huge database file. I also created an automation to upload the text file to Dropbox using this.
However, I’m open to hear other workflows since I’m a complete noob in these topics.

You could either adjust your recorder config and use the default db, alternatively checkout:

So I’m curious what performance impact if any you’ve seen in the viewing of history on the home assistant web GUI when the database is in the cloud.

Just using Google SQL as a recorder for data science work, history still uses the default local sqlite db

Per the documentation, recorder does change the storage location of data for the history component. Unless you’ve manually modified the code.

@keatontaylor thanks for pointing that out, I didn’t realise that and have just noticed that I don’t have a history tab to view…! (history is configured). I don’t see a config option to use the recorder simply as a backup recorder? workaround would be to publish event data using the MQTT statestream, consume the stream on a second HA instance and have that do the recording.

Hey guys,
I’m also interested in doing some data mining in Hass to learn two main things:

  • Presence prediction to improve heating/ac automation based on presence sensors historical data
  • Time to reach desired temperature based on actual indoor temperature and outside temperature.

@ejalal please keep us posted on your analytics :slight_smile:

Hello, Folks - I’m just catching up to this. My first step is to write an appdaemon app that access my implementation of the mariadb addon. The appdaemon app is simple, and is:

import appdaemon.plugins.hass.hassapi as hass
import datetime
from urllib.parse import urlparse
from typing import List
import sqlalchemy
from sqlalchemy import create_engine, text
import mysql.connector as mariadb

Run from utilities.yaml
class GetSQLData(hass.Hass):

  def initialize(self):

    fetch_entities = True

        mariadb_connection = mariadb.connect(user='user_name', password='password', host='', database='homeassistant')
        self.log("If we made it here, the mariadb.connect command worked")
        self.cursor = mariadb_connection.cursor()
        self.log("Successfully connected to database {}".format(mariadb_connection))
        if fetch_entities:
            query = 'SELECT * FROM states WHERE entity_id = "sensor.dark_sky_temperature"'
                self.log("Performing query {} with cursor {}".format(query, self.cursor))
                response = self.cursor.execute(query, params=None, multi=False)
                self.log("Query response {} with cursor {}".format(response, self.cursor))
                self.log("Error with query: {}".format(query))
    except Exception as exc:
        if isinstance(exc, ImportError):
            raise RuntimeError(
                "The right dependency to connect to your database is "
                "missing. Please make sure that it is installed."


As you can see, I borrowed a few lines of your code, @robmarkcole.

I also downloaded a database browser, and successfully connected to the mariadb. I can write a line sql code SELECT * FROM states WHERE entity_id = "sensor.dark_sky_temperature" and successfully get data from the database.

However… when I run the above code with the exact same query, I get the following output:

2019-11-14 16:32:45.023301 INFO sql_mariadb: If we made it here, the mariadb.connect command worked
2019-11-14 16:32:45.030726 INFO sql_mariadb: Successfully connected to database <mysql.connector.connection.MySQLConnection object at 0x73a54230>
2019-11-14 16:32:45.037847 INFO sql_mariadb: Performing query SELECT * FROM states WHERE entity_id = "sensor.dark_sky_temperature" with cursor MySQLCursor: (Nothing executed yet)
2019-11-14 16:32:45.050681 INFO sql_mariadb: Query response None with cursor MySQLCursor: SELECT * FROM states WHERE entity_id = "..

Up to the last line, everything is great. And then the query comes up with None… And the query looks truncated in the cursor. There are no errors in the error log.

Anyone see what I’m doing wrong?

Any help is greatly appreciated!

As a bit more info, if I change the query to:

SELECT * FROM states WHERE entity_id="sensor.dark_sky_temperature" AND DATE(last_changed) = "2019-11-14"

The last line of the log changes to:

2019-11-14 16:49:26.299855 INFO sql_mariadb: Query response None with cursor MySQLCursor: SELECT * FROM states WHERE entity_id="se..

Not sure if that’s relevant or not…

I am curious why you are not using data detective? It would be much easier to debug these kind of issues. I am considering giving it an overhaul if there is interest

1 Like

Hi, @robmarkcole: I’m not exactly sure how to install it, and then how to make it work with I’m assuming I need to install it on a separate computer with a full Python implementation, and I don’t really have that right now. As a first attempt, I just wanted to get a simply query working within appdaemon (which seemed very straightforward - at first).

Can you offer some guidance on how I actually implement detective on my setup (RPi 3b+). I do have a PC, but it’s not on all the time. I also have a NAS that is all the time.

Thanks for the response!

there is a hassio addon you can use and lots of documentation :slight_smile:

Ha! And of course I hadn’t found that yet. Thanks! I give that go.

1 Like

I think this topic is the best place for my question, instead of starting a new topic.

I’ve been trying out modAL to support active learning. I think this is a good framework to use together with home assistant and to start using more and smarter statistical methods for our automations. To start out I have an idea to fine-tune the bayesian sensor using this package. My main question is of people already have experience using this with home assistant? I’m also not entirely sure yet how to apply it in a user-friendly way.

To explain my main idea, let’s assume a bayesian sensor that detects if anyone is home. I’ve filled in the observations and have a prior. It is working decently, but not perfectly.
My idea is to use the active learning to select combinations of observations for the user to label. In essence we start labelling data for all sorts of possible applications this way, but I’ll restrict myself to the bayesian sensor here.
The exisiting posteriors are used (or maybe calculated) and from those that are closest to the threshold (both above and below) distinct timeslots are selected and presented to the user. The user than simply tells the system if anyone was home or not for this time slot. This labelled data is then used to calculate the exact probabilities and the model is adjusted accordingly. You can repeat this in a feedback loop, continiously improving the accuracy of the probabilities.

To keep the user provided probabilities relevant in the beginning (and the learning process more stable), I’m thinking of averaging this somehow with the labelled probabilities, weighed by the number labels. Once an observation has been labelled ‘sufficiently’, the manually entered probabilities should then have little effect, while if there are few or even no labels it will dominate.

Hopefully the idea is clear. I think, if I manage to somehow represent the situation clearly to the user, this is a much more natural way of fine-tuning a bayesian sensor, or any other possible statistical sensor. Once you have many labels, you could probably also start removing highly correlated observation or useless observations. Or deduce extra observations possibly.

Another advantage to this approach I think is that the labelled data will stay relevant even if you add or remove observations, so it doen’t have to be retrained neccesarily. The labelling could also be used for an entirely different approach, such as a logistic regression sensor?

What do you all think? Is this a good plan, and, if yes, how would this best be implemented you think?

1 Like

@sophof thanks for starting the discussion on this topic. The first task here is coming up with a suitable UI for labelling data. I suggest you could protype an MVP using streamlit then trying it out on the community?

1 Like

Wow, that looks very similar to R-shiny (I use R for work), that would be exactly like what I need!
Do you have any experience labelling data using active learning? I’m a bit worried it wouldn’t converge quickly enough to something usable (no one wants to label hundreds of situations).

I did use a remote control to gather ‘ground truth’ data for my ‘in-bed’ bayesian sensor. I dont have experience of active learning, althouh I assume Alexa is doing this…?