Data science with Home-assistant

Thanks!
Hmm Interesting. It looks like I might do well to look into the NAS or NUC setup and docker learning then…It does look like the NAS can do quite a bit though since it looks like you’re able to do some ML-related image processing with it though as well.

The Google cloud tutorial looks pretty cool. I think I’ll definitely give that a try for selected data like temp and humidity and some other data that I don’t consider exploitable marketing-wise…I’m really interested in the ‘Big 3’'s platforms and beside the fun factor having some experience with them has an added benefit for job skills

(I’ve been drawn to the local effort b/c in the U.S. at least, anti-discrimination laws and the concept of a individual’s right to basic privacy are all taken as a big joke by prospective employers in my opinion, and I wouldn’t at all be surprised if bulk purchasing of marketing profiles is a technique used to illegally screen out applicants that algorithms have deemed likely to e.g. have a health problem(s) that can be gleaned from sold ‘smart home’ device data showing how well people sleep, how often they have to get up at night, etc.)

Nice HA + micropython & circuitpython tutorials and others btw :slight_smile:

On my NAS I am running HA & postgres only, so not really pushing it. I’ve done image processing mostly on rpi or even cloud. Another good reason for keeping it local is no nasty surprise bills.
For job market probably learning AWS is best ROI, although I also like GCP. Overal learning python bas been my best decision job wise, I am working for an IOT startup on the back of my HA/python experience, so very glad I put the time in!

1 Like

@ robmarkcole
So after reviewing some differences between MariaDB and PostgreSQL, I decided on PostgreSQL.
Unfortunately, there’s isn’t an add-on for it yet in Hass…so, I’ve looked at documentation for:
HA’s recorder component, the PostgreSQL docker image, and docker in general.

I’m still having some trouble understanding how I can install PostgreSQL in my hass installed in Docker (alternative Linux installation) such that:

  • snapshots created in the HA front end will also backup the database as it does now with the default home-assistant_v2.db (so whenever I need to restore a snapshot the database will not contain future event, invalid entity id’s, etc)
  • ideally would use unix sockets for efficiency instead of TCP (and the resulting not needing to specify a password in the recorder url to the db is nice too)
  • is handledby the hass supervisor such that it’s available before home assistant starts up, and anything else the supervisor usually does for other containers

Do you have any advice as to how to make this happen and make it work with the data science work you’ve implemented?

I’ve looked all over the forums, but I can only find bits and pieces…

So far the best I have is:
sudo docker run\ --name <postgresContainerName> -e POSTGRES_PASSWORD=<postgresRootUsersPassword> -v <PathToVolumeOnHost>:/var/lib/postgresql/data -p 5432:5432 -d postgres

To begin with, I’m not really sure what to put for the host path <PathToVolumeOnHost>

Would appreciate any advice on this so I can get up and running with data science here. Thanks!

HI @carver
you are interested in more advanced topics than I have experience with, re backups etc. Specifically regarding mounting existing postgres data, I dont do this either.
I will make a suggestion to checkout LTSS as this has some advantages for data sci, such as server side aggregation operations

I just used the data detective to answer a question for my wife - is our bedroom the correct temeprature for our incoming baby? Babies require temperatures between 16 - 20 deg Celcius, with the guidance that it is better to be on the cooler side as babies can easily overheat. Using the detective I was able to easily calc our night time temperature mean as 16.5 deg Celcius, satisfying the temperature requirement.

And histogram confirms temperatures are mostly in the desired range, but occasionally dipping below 16 degrees so I might increase the heating set point ever so slightly.

Hopefully this will reassure my wife…!

3 Likes

You might be interested in this article.

It would be nice to see a “prediction timer” for when a desired temperature will be achieved in a room, based on the heat source in the room and the set temperatur on the termostat.

1 Like

That is an interesting suggestion. I got an email update on my smart thermostat, and it will now ‘detect’ open windows based on if heating a room is taking longer than normal

1 Like

I am no programmer, but I hope some one that knows a bit of coding wants to take the generic_thermostat to the next level.

I use generic_thermostat in all the rooms to set the proper temperature and distribuating the air to the next rooms with fans.

Now I use automations and scenes, to get the “smart” termostat functunality.
But this logic shuld be part of the thermostat.

The thermostat shuld be based on the out door temperature and thermal resistance of the walls (U value in windows) + The heat source in the room to get and maintain a desired set temperature

Now I set a scene based on time of day and based on the persons at home :

  • to get a desired temperature when we get out of bed
  • Set a temperature when you get to work (shuld be able to increase desired temp with in 1 hour).
  • A set temperature for sleeping.
  • A Long time minimum vacation temperature

I can recommend the: “Xiaomi Mi Aqara Smart Air Pressure Temperature Humidity Environment Sensor” sensor. if you want tips on inexpensive zigbee sensors.

When I put them next to each other, the result in temperature is les then 0.02 degree.
Air Pressure is also very similar.
Humidity can have a bigger diffrence, but in my application, to get a notification when the londery is dry. it’s good enough:-)

So the novelty of the algorithm would be on the inclusion of the presence and activity of people (sleeping). Accurately detecting people and their activity has been almost impossible until very recently, and even now remains very challenging IMO. However their inclusion in a thermostat algorithm would be straightforward. Probably this post is more suitable for the thermostat thread as there is not really any data science element.
Cheers

I think the first challenge is defently a Data science task.

home assistant climate need a “thermal capacity sensor” that can predict the amount of time it will take to raise or lower the temperature in a given room with a given heat/cooling source.

The climate controll service shuld also have a derivative sensor, that will detect if the temperatur is climbing or falling.

Derivative sensor was just added https://www.home-assistant.io/integrations/derivative/

Was thinking about how to tag false positives/negatives/other data categories. Not sure if I understand your approach above, but perhaps it would be possible to use template sensors? You would put the main sensor value/state of interest into the value field and the maybe use a input_select or similar to populate attributes as the tag? It could reset to default after x minutes or when you tell it to. Then all the data is in one entity. I don’t know how this appears in the database (I haven’t tinkered yet), but hopefully it would be on one row, and so well shaped for analysis? Does that make sense? I’m using a kind of similar approach to gather images for training OpenCV. When I see hot air balloons outside I say “hey google, I see balloons”, and home assistant starts taking image snapshots from a camera at intervals for 10minutes and writes them to my “positive” directory (@robmarkcole, you might be interested in this image collection technique). The light on my xiaomi hub goes on/off to remind me to check after 10minutes if the balloons are still there.

Hi @Mahko_Mahko
I am intrigued by your use case - are you monitoring weather baloons or something? How does OpenCV fit in?
RE ground truth I was previously using a Hue remote to log (manually) when I was going to bed and getting up.
Cheers

I don’t think the technical part of labelling is the hard part. What you suggest for instance would work fine.
After the original post I got myself an ikea button and have created input_booleans similar to what you suggest. this works, but for the bayesian sensors I have it really won’t help much, so I’m still tihnking about my original plan.

During this corona-mess I have started on the project to just get the basics in order. I now have a basic python program running that communicates to HASS via REST and can request the history of all sensors. So the basics are there but I still have to start on the hard part.

I still don’t know how to effectively get the configuration and potentially provide back a new configuration with adjusted probabilities. Obviously I could just parse the existing configuration, but it seems a bit silly to not somehow use home assistant for that.

I’m also weak in the presentation side. I have a rudimentary streamlit webinterface for now, but as an end product I would really prefer a home assistant panel.

Hi @robmarkcole, it’s really a variation of the approach you helped me with. I used the OpenCV component to detect if my blinds were up or down. For the ‘balloons’, I get hot air balloons rising on my skyline in Melbourne and I just want to know when they are out there so I can just have a look (they are quite nice). I thought I would train OpenCV with images from the actual context/background rather than off the web. Ideally I could automate iterations of model training based on feedback about false positives/negatives (maybe you have done this in another image processing platform? That is probably a discussion for other threads I guess. I’m also working on timelapse automation, so I might detect their presence, create a timelapse, and then play it back to me on one of my screens when I rise (they are typically at sunrise). You could probably mash together something similar for bird watching;)

Sounds interesting. Please keep us posted. Oh I get what you mean about re-configuring the configuration and linking it to the observations now - that makes sense:). I guess ideally the bayesian probabilites could be defined and dynamaically set via a input_number or similar (like how the target temp on a themostat can be adjusted)?

@sophof I am interested to see your streamlit ui as am working on similar, is it on github?

@Mahko_Mahko for classification of balloon in image why not use tensorflow/pytorch? Also if you want to dynamically update the bayesian sensor, it would be an idea to add services to the integration to do this.

Thanks for the suggestions. I took your original suggestion on using OpenCV for the blinds actually. But I’m keen to try other platforms, so might give your suggestions a whirl for the balloons. Quite keen to dabble with tensorflow.

I don’t actually use the bayesian sensor myself anymore. Just still interested/curious about it (nice write up/tutorial by the way). Main previous application was focussed on home level presence detection and I have it working very reliably by combining multiple sensors in a much simpler way (if any of my devices are home, then I’m home - that is working fine for me). If you have any other interesting/useful use cases then I’d like to hear about them.

Not yet, it is not much more advanced than the tutorial at the moment. I’ll definitely share once I’ve achieved an mvp, right now I’m still just playing around to get an idea of what I can possibly do.

1 Like

On the data tagging theme, I had the thought that having some data pushed at you via actionable notifications and tagged up with your response could be convenient? Like in my computer vision use case - maybe my trained model sends me balloon pictures, then I confirm/negate them for future re-runs? Could maybe do the same with some bayesian sensor changes, then analyse them?

I’ve also started work on inferring indoor lighting levels based on external light sensors for better light automation control. I have one indoor sensor but it gets affected by lights when they turn on! And my external light sensors can give quite different readings (Mi Flora, bh1750).

I’m thinking I train a multiple regression model to predict indoor lighting levels given multiple outdoor readings and the condition that no internal lights are on.

Any thoughts welcome.