Smart Home Dataset for Machine Learning Project

Cao_Hoa · March 2, 2021, 9:19pm

My file cannot be downloaded
Do you have any other way?

peanutbutter · March 5, 2021, 10:05am

Do you have Home Assistant installed on a Raspberry Pi or inside a VM (Virtualbox/VMWare) on your PC?

Cao_Hoa · March 5, 2021, 10:52am

I installed on intel-nuc and installed on ubuntu server
And I have a backup

Cao_Hoa · March 5, 2021, 10:56am

I have tried to send you

peanutbutter · March 5, 2021, 11:18am

Hey, I have made some space, 2.25 GB is free now. How large is the file?

Cao_Hoa · March 5, 2021, 2:15pm

Has the file I sent you received yet?

AaronCake · March 6, 2021, 3:44pm

If you use GPS tracking via GPSLogger or the HA apps, the database will contain your logged GPS coordinates. The DB could also contain things like API keys, depending on the integration and how it stores attributes. TTS phrases will be in the DB. Point is that the DB could contain significant personal information.

peanutbutter · March 6, 2021, 6:21pm

I am not sure about the API keys, because I have several configurations (like with SSH or Google Smart Home integration) and I checked and all the API keys/secrets get stored in the yaml file and not inside the DB. That location data might be inside the DB if you installed a GPS logger makes sense, I have not thought about that in advance, but that might be used by some people.

Anyway thank you Aaron for the helpful note.

peanutbutter · March 6, 2021, 6:22pm

Yes, thank you very much, Cao!

Cao_Hoa · April 10, 2021, 2:43am

Do you need to add the db file data?
I have over 4gh new data

Cao_Hoa · August 10, 2021, 4:07am

Have you finished the project yet?
can i help you?

peanutbutter · August 10, 2021, 5:24am

Hi Cao,

So I was having a simple HA setup with a thermostat, smart lights, door and window sensors and a couple of other stuff. And with HA, everything was locked in the .db file inside the Raspberry Pi. So after 2 months passively collecting the data, I tested it out over the course of a day and wrote some Python script. What I was doing is simulating some unusual behavior, i.e. things that were not observed in the training data, and see if the model would pick up on it, e.g. simple One-class SVM. This included someone opening the door when I am not at home, flickering lights, lights turning on in the middle of the night (as this did not happen in the training data), making the temperature sensor of the thermostat measure higher than usual temperature, basically simulating a fire or something, but also more subtle things, like opening the window when it is cold outside, but having the heatings on at the same time. Obviously, you can hard-code everything, like have a rule that turns off the heating, when it recognizes that the windows are open, etc., but as I explained I wanted to see, if a machine learning model can recognize something like this automatically. So it worked partly, but mostly on things time related, my model was too stupid to recognize more subtle things like the heating scenario. Also FP rate was kind of high, like 2%. I hoped, if I had a larger dataset from diverse users, that the model would be smarter and generalize better. But on the other hand, it was hard to integrate, as your dataset and mine only have a small intersection of the same devices and usage patterns are probably very different as well. Anyway, after those first results, that showed that in some cases it worked quite nicely but overall it is not robust enough and can probably only work very good with lots and lots of data, I was carried away with other things.

Cao_Hoa · August 10, 2021, 6:10am

I have a new 4g data file, do you use it?
also i am using frigate to identify people from cars and dogs and cats and other things. you can write in event form
i am using this for doubletake face detection and recognition
When a trusted face is detected, the event or action will proceed to perform automation
Can you share the settings with me?
I’ll run it on my system and let you know the results

Cao_Hoa · August 11, 2021, 7:48am

I have a new 4g data file, do you use it?
also i am using frigate to identify people from cars and dogs and cats and other things. you can write in event form
i am using this for doubletake face detection and recognition
When a trusted face is detected, the event or action will proceed to perform automation
Can you share the settings with me?
I’ll run it on my system and let you know the results

Cao_Hoa · November 1, 2021, 8:07am

Have you seen this thread yet?

ShuaiQA · December 20, 2022, 1:47am

Hello, I am a student. My research direction is mainly interaction threats. The content of my research is similar to this article “Scalable analysis of interaction threats in IoT systems”. And I want to analyze some data sets. If your data set can Share it out, I will be very grateful.

binhnguyen3009 · July 17, 2023, 7:55am

Hello Peanut Butter,

Since such a long time when you guys shared about this topic, I have just been popped in and felt interest in it. I’m doing a thesis of the anomaly detection for smart home. I use machine learning to do it, via lots of models such as RNN, CNN and SVM but I found that Decision Tree gives high accuracy and must faster on training and evaluating. From your sharing I found this could be good if I implement the model in HA to detect anomaly behaviour as such as you explain. If possible could you share a little bit about your project or paper you have done about it. And the dataset you have implemented in it? Thanks and hope to see your reply soon.
Kind regards,
B

peanutbutter · July 17, 2023, 8:24am

Hey binhnnguyen! The model used was Autoencoder with GRU. I sent some more details via private message. All the best for your thesis

JasonalLy · October 20, 2023, 1:43am

Hello Binh,
I’m also interested in smart home failure prediction, and I want to use deep learning to predict failure categories and causes. But, to my dismay, I don’t have the data set. Can you share what data set you used. Thank you very much.
Yours sincerely,
Jason

JasonalLy · October 20, 2023, 1:52am

Hi peanutbutter,
I am interested in smart device failure prediction and I want to use it to predict failure causes and classification. But I don’t have a data set on that, and I’m excited that you’re doing experiments on that, too. Your dropbox address doesn’t seem to be working anymore, and I can’t get it. Can you share the relevant data sets that you use?
Yours sincerely,
Jason