Data science with Home-assistant

I think the first challenge is defently a Data science task.

home assistant climate need a “thermal capacity sensor” that can predict the amount of time it will take to raise or lower the temperature in a given room with a given heat/cooling source.

The climate controll service shuld also have a derivative sensor, that will detect if the temperatur is climbing or falling.

Derivative sensor was just added https://www.home-assistant.io/integrations/derivative/

Was thinking about how to tag false positives/negatives/other data categories. Not sure if I understand your approach above, but perhaps it would be possible to use template sensors? You would put the main sensor value/state of interest into the value field and the maybe use a input_select or similar to populate attributes as the tag? It could reset to default after x minutes or when you tell it to. Then all the data is in one entity. I don’t know how this appears in the database (I haven’t tinkered yet), but hopefully it would be on one row, and so well shaped for analysis? Does that make sense? I’m using a kind of similar approach to gather images for training OpenCV. When I see hot air balloons outside I say “hey google, I see balloons”, and home assistant starts taking image snapshots from a camera at intervals for 10minutes and writes them to my “positive” directory (@robmarkcole, you might be interested in this image collection technique). The light on my xiaomi hub goes on/off to remind me to check after 10minutes if the balloons are still there.

Hi @Mahko_Mahko
I am intrigued by your use case - are you monitoring weather baloons or something? How does OpenCV fit in?
RE ground truth I was previously using a Hue remote to log (manually) when I was going to bed and getting up.
Cheers

I don’t think the technical part of labelling is the hard part. What you suggest for instance would work fine.
After the original post I got myself an ikea button and have created input_booleans similar to what you suggest. this works, but for the bayesian sensors I have it really won’t help much, so I’m still tihnking about my original plan.

During this corona-mess I have started on the project to just get the basics in order. I now have a basic python program running that communicates to HASS via REST and can request the history of all sensors. So the basics are there but I still have to start on the hard part.

I still don’t know how to effectively get the configuration and potentially provide back a new configuration with adjusted probabilities. Obviously I could just parse the existing configuration, but it seems a bit silly to not somehow use home assistant for that.

I’m also weak in the presentation side. I have a rudimentary streamlit webinterface for now, but as an end product I would really prefer a home assistant panel.

Hi @robmarkcole, it’s really a variation of the approach you helped me with. I used the OpenCV component to detect if my blinds were up or down. For the ‘balloons’, I get hot air balloons rising on my skyline in Melbourne and I just want to know when they are out there so I can just have a look (they are quite nice). I thought I would train OpenCV with images from the actual context/background rather than off the web. Ideally I could automate iterations of model training based on feedback about false positives/negatives (maybe you have done this in another image processing platform? That is probably a discussion for other threads I guess. I’m also working on timelapse automation, so I might detect their presence, create a timelapse, and then play it back to me on one of my screens when I rise (they are typically at sunrise). You could probably mash together something similar for bird watching;)

Sounds interesting. Please keep us posted. Oh I get what you mean about re-configuring the configuration and linking it to the observations now - that makes sense:). I guess ideally the bayesian probabilites could be defined and dynamaically set via a input_number or similar (like how the target temp on a themostat can be adjusted)?

@sophof I am interested to see your streamlit ui as am working on similar, is it on github?

@Mahko_Mahko for classification of balloon in image why not use tensorflow/pytorch? Also if you want to dynamically update the bayesian sensor, it would be an idea to add services to the integration to do this.

Thanks for the suggestions. I took your original suggestion on using OpenCV for the blinds actually. But I’m keen to try other platforms, so might give your suggestions a whirl for the balloons. Quite keen to dabble with tensorflow.

I don’t actually use the bayesian sensor myself anymore. Just still interested/curious about it (nice write up/tutorial by the way). Main previous application was focussed on home level presence detection and I have it working very reliably by combining multiple sensors in a much simpler way (if any of my devices are home, then I’m home - that is working fine for me). If you have any other interesting/useful use cases then I’d like to hear about them.

Not yet, it is not much more advanced than the tutorial at the moment. I’ll definitely share once I’ve achieved an mvp, right now I’m still just playing around to get an idea of what I can possibly do.

1 Like

On the data tagging theme, I had the thought that having some data pushed at you via actionable notifications and tagged up with your response could be convenient? Like in my computer vision use case - maybe my trained model sends me balloon pictures, then I confirm/negate them for future re-runs? Could maybe do the same with some bayesian sensor changes, then analyse them?

I’ve also started work on inferring indoor lighting levels based on external light sensors for better light automation control. I have one indoor sensor but it gets affected by lights when they turn on! And my external light sensors can give quite different readings (Mi Flora, bh1750).

I’m thinking I train a multiple regression model to predict indoor lighting levels given multiple outdoor readings and the condition that no internal lights are on.

Any thoughts welcome.

I am pretty sure you could use a telegram bot for tagging pictures via notification

1 Like

I do that for my front door face recognition system. It’s using appdaemon, still in beta and not that well documented but maybe you can reuse some of it.

Basic workflow is

  1. When door opens take 10 pictures
  2. Try to detect faces (using Facebox, deestack,…)
  3. Announce when a known face got detected
  4. For unknown face Send image of unknown face with telegram buttons for each known face
  5. If I press one of the buttons it puts the image in the correct directory of known faces
  6. Extra button for a new face where I can write the name of the person
3 Likes

That’s pretty cool.

Anyone doing data science with python has probably heard of streamlit by now. Anyways I just got streamlit demo running as a Hassio addon, which opens up the possibility of creating streamlit data science apps that can be installed as like any other addon. I think I will create one for doing prediction of sensor data using the prophet library.

5 Likes

Any more experiences/progress with this? I’ve just started tinkering with streamlit (pretty slick) and am keen to try something down the line.

There are issues using streamlit on RPi, follow on https://discuss.streamlit.io/t/raspberry-pi-streamlit/2900
Of course you can use streamlit anywhere else (mac, windows etc) but I havent spent any more time on it

1 Like

Hi, finally getting back to this…wondering how you have recorder and history configured?
Do you have auto purge set to never for capturing data in your PostgreSQL database?

I ask bc the docs for recorder warn that doing so will slow down HA but I’m wondering if that is really more relevant for the default SQLite db and/or possibly using an rpi…

Trying to determine if I need a to capture data via Mqtt or creating a custom component like someone did for LTSS (I’m still wanting to try PostgreSQL first before getting into time series databases)

Thanks!

LTSS and the Timescale DB used by ist are based on postgres. Timescale DB contains only the possibility to partition the data automatically by time chunks, which makes time-based queries faster.
Basically it is “only” a Postgres DB with some optimization :wink: So you can use it like any other SQL DB for analyse your data.

For example, I query my data from it directly via the jupyter integration for the MS visual studio code editor. Works wonderfully :slight_smile:

@carver I am running postgres in a docker container on a synology NAS. My config is simply:

recorder:
  db_url: !secret postgres_url
  auto_purge: False

I have relatively few entities in my prod HA, so the db is growing quite slowly.
For analytics timescaleDB would be a good choice, and as @CM000n points out it is a postgres extension, with nice features for sampling time series data. Only drawback I found is that it is not supported on rpi

1 Like