Smart Home Dataset for Machine Learning Project

Hi everyone,

I am doing a programming/machine learning project on smart home and I need a bunch of data about the devices statuses and context like temperature and so on. In the end, I want to be able to detect anomalies for better security of the smart home.

For this to work I need a huge dataset from different users and you could really help me out if you would like to upload your home assistant .db file for me here on dropbox, especially if you are already using home assistant for a while.

This .db file contains all the logs that are displayed in the history tab of home assistant, e.g. when a light is turned on.

Thank you very much in advance:)

You can download the .db file easily when you have the File Editor plugin installed:

And in return what will you offer?
Is there anything Confidential in the db?

Hi, since this is just a hobby project, I can’t pay or anything. But I guess, I can give back to the community by providing the combined dataset in the end? Maybe other people want to experiment with similar stuff that would be useful like perhaps trying to learn automations from user interaction automatically without requiring to program them?

The db file contains the event log of your devices, like when a light got turned on a.s.o, so normally I would not consider that as confidential, but I guess that depends on what kind of devices you have linked to home assistant.

You can exactly view/query what is inside the db file when you paste the file in here:
Mine looks something like this:

how can i send the file to you

Hi Cao Hoa,
you can put it here:

My file cannot be downloaded
Do you have any other way?

Do you have Home Assistant installed on a Raspberry Pi or inside a VM (Virtualbox/VMWare) on your PC?

I installed on intel-nuc and installed on ubuntu server
And I have a backup

I have tried to send you

Hey, I have made some space, 2.25 GB is free now. How large is the file?

Has the file I sent you received yet?

If you use GPS tracking via GPSLogger or the HA apps, the database will contain your logged GPS coordinates. The DB could also contain things like API keys, depending on the integration and how it stores attributes. TTS phrases will be in the DB. Point is that the DB could contain significant personal information.

I am not sure about the API keys, because I have several configurations (like with SSH or Google Smart Home integration) and I checked and all the API keys/secrets get stored in the yaml file and not inside the DB. That location data might be inside the DB if you installed a GPS logger makes sense, I have not thought about that in advance, but that might be used by some people.

Anyway thank you Aaron for the helpful note.

Yes, thank you very much, Cao!