Master Thesis - Machine Learning with Home Assistant

Hi Simon

Thank you very much! To do propper pattern recoginition the data should contain at least 7-10 Days.
PM sent :slight_smile:

Best regards

Anja

Does that work with the MariaDB addon? Then I can also share my data with you.

1 Like

I did not test it with a MariaDB but according to the Home Assistant Data Detective Documentation it does - you just may have to alter the way you adress the DB via URL. I linked the part of the documentation in the Jupyter Notebook. If you need further help please feel free to message me! :slight_smile:
Thank you!

Would a negative be any help in training?

My house is empty at the moment. Still a bit of automation going on though.

1 Like

Hi!

I’d also appreciate a negative :slight_smile: Maybe you could alter the file name of the .csv so that I could recognize the negative with _neg or something similar.
Thank you!

1 Like

Totally forgot that I have filtered most of my entities in the recorder. I just enabled it for all entities and will let it run for 10 days to share it with you then.

1 Like

I am so jealous! I was going to go to a data science mini class (about six months of intensive data science classes) and was going to do this VERY same data analysis. I wasn’t going to do it for all the people, but my own darn self. I think this is a better idea though because you will have a much better data set to pull from. You should look into open sourcing the data once you have stripped it of anything that can be used to identify people. This could be a small treasuretrove of information that could be used to help create some form of machine learned automation creation within HA! My end goal was to work with someone to help create something for HA, but life got in the way and I couldn’t attend the class. Plus it was really more than I could afford, so I had to wait on it. I just got my new setup going, will wait a week then revisit this and post my data for you to mull over (it won’t be much, I am starting over from pretty much scratch).

2 Likes

I second this, perhaps people here can say if their data can be anonimized and reused. I think big companies have access to a huge amount of data due to the way they collect everything, but there are not so many options left for the little guys.

IMHO It’s not always about having as much data as possible, but also about having the right data. You don’t need just Big Data but Thick Data. The quality of the data for answering the respective question is at least as important as the quantity.

At my employer (who is also a “big player” in its segment in Europe) I experience it again and again that people live with the idea that we just collect everything somehow and then see if we recognize meaningful patterns. In reality, however, this rarely or never works. Mostly you have to come from the direction of the hypothesis and generate the specific data to falsify it.

2 Likes

Hmm. Interesting project. But as far as I see it, big chunks of raw data are pretty much useless without additional contextual meta data. How are you supposed to know what the data from my sensor.temp_54fac6 represents ? Is it my bedroom temperature, my living room temperature or the temperature in my garden shed ? It’s even worse for motion and door sensors, because they’re binary. Did I open up the door to the bathroom or the door to the basement ? Add to that the vastly different layouts of peoples homes. If you want to train a model to correlate human behavior with sensor data, then you need meaningful and correlated data to begin with, you need to have access to that kind of meta data. Otherwise you’ll just train your model with noise.

Just curious on how you’re planning to manage this :slightly_smiling_face:

3 Likes

I can barely train a model on my data, let alone on other’s data. But I don’t want to get in the way of Anja, I think I will get the statistical data from other places, there is plenty IoT data freely available online, some examples being:
https://thingspeak.com/channels/public
https://dweet.io/see
But I agree there is a need to have quality data that can easily be used to cross-correlate patterns.

1 Like

Thank you for your input!

When you extract the database it includes the friendly names of the entities. From the data I got so far I could see that almost all Users renamed their Sensors with meaningful names e.g. sensor.temp_Bedroom, sensor.door_fridge or smth similiar. There might be some exceptions where ppl prefere the “complicated” numerical names - I’ll see how it works out :slight_smile:
But youre right, Data preparation before training will probably consume some huge amount of time.

Just curious if this resulted in anything? Planning on working on a similar project if there isn’t already something to build on.

Thanks.

1 Like

Hi :slight_smile:

I’m still working on the Project! I hope I will have some time around Christmas to give a proper detailed update on my work.
If everything goes according to plan I will finish the project around February 21. I may publish the repo after the thesis is completed and passed the whole University process :slight_smile:

3 Likes

Hallo Anja,

Interesting topic. I am curious to know where you are at.

While it is still good to observe the data, I really wonder how you progress. I am not a machine learning expert but I would starting by determining the problem I want to solve! Because finding patterns without having a problem to solve is utopic.

As an example and certainly too late for your thesis (I do not have the data yet) but I would have a concrete problem to solve: manage the heating system. I have an underfloor heating system (water based) with a 3-4hours latency to warm up / cool down and I am convinced I can collect all necessary data to train a model that will determine what needs to happen to have 21 degrees in the house at 6pm for example. Not only the indoor / outdoor temperature sensors data but also all data related to the heating system (water temperature, valves opening…) and most importantly all weather forecasts for the next 24h (temp, humidity, wind, sunshine…).

Conclusion: on top of asking the people for data, I would ask them what kind of automation they could need or problem they want to solve because then the data could start making sense.

Maybe this helps.

me too, in my case it is a floor heating system and our flat quickly overheats.
I would have the same boundary conditions.

What I could offer is to create a mathematical model of the flat to calculate heat transfer. I would simulate it in Mathlab/Simulink or its open-source counterparts.

When I search for HA and ML I find only:

maybe Learn Netdata could be a solution to share data anonymously if they are going to develop a plug-in for HA (all the backends are already supported)

Hi @Anschke
I’m really interested if you could provide your results on this research project.

I’ve started looking into apply ML for HA by creating this add-on GitHub - lcmchris/thesillyhome-addon-repo: add on repo for thesillyhome but would love to see your methods in wrangling the data/ what models you were using.

Thanks,
Chris :slight_smile:

It’s a fantastic idea. Guys who are interested in building it together. Please reply. We may create a chat group to take this further.

Initially, I want to start working on thermostat while learning data from

  • Indoor temperature (DHT22)
  • Outdoor temperature
  • Indoor Humidity
  • Outdoor Humidity
  • Weather API data

Turn on Air conditioning and set the temperature according to usage patterns.

@Nova @Rahulsharma0810 @Anschke

hey guys, tbh I am looking for likeminded people as I have no friends and acquaintances :smiley:
the main focus of my studies where ml in general and I am super enthusiastic about bringing ML to HA

how about we start a thread or groupchat as @Rahulsharma0810 already suggested to stay in touch and maybe collaborate?

hello I have some questions on where you put the ml files and hoe you integrated them with HA. because I’m having difficulty , if you can help please let me know