Bayesian Sensors - Best Methods?

I started using the binary sensor platform for Bayesian Sensors a few months ago. After several months of working with the bayesian sensor, I have what I think is something akin to a best practice (at least for me).

I’ve seen several posts where people get somewhat tied up in what the parameters should be in the sensor. The sensor configuration is fairly simple (at least until you get into the details and try to make it work as expected).

Below is my sensor that decides if we’re away for an extended period (we define that is overnight). If it triggers to ON, then that blocks some automations, and enables others - for example, the morning coffee script doesn’t need to trigger if we’re not home. But I still want some lights to turn on and off in the evening and in the morning - the automations using this sensor are pretty easy to set up - in the conditions section of an automation, I create a condition for normal automations (coffee run, get ready for work, etc) when the sensor is OFF. The automations that run when we’re not at home have this binary sensor as an ON condition.

After giving this some thought, there are a number of observations that could “indicate” we are away for an extended period (overnight or longer). These are:

  • I’m more than 50-100 miles away
  • My partner is more than 50-100 miles away
  • My google calendar has the word Hotel in it for the day
  • One or both of us is more than 100 miles away (higher probability than the 50-100 mile range)
  • No motion has been sensed by any of the house motion sensors (there are several motion sensors scattered throughout the house - odds of us being gone is high if none of them have detected motion in the last 10 hours

Setting up the Bayesian sensor is a two-step process - complete the top part and then add one or more observations.

The top portion has three parameters: a friendly name (which is turned into the Bayesian sensor name by making the words lower case and adding a “_” between the words), a prior probability, and a probability threshold. Don’t worry too much about the actual values for these probabilities right now. I always set the prior probability to 0.4, and then adjust it and the threshold until I get the desired outcome (more on that below).

Each observation is based on another sensor (sensor.xxx, binary_sensor.xxx, input_boolean.xxx, etc). Below, I use binary sensors and input_booleans. There are four related to range from home (two for each of us), one that looks for the word “hotel” in my Google calendar, one that looks for no house motion within the last 10 hours, and then two overrides.

- platform: bayesian
  name: Extended Away
  prior: 0.4
  probability_threshold: 0.98
  observations:
    - entity_id: 'binary_sensor.kirby_far_range'
      prob_given_true: 0.7
      prob_given_false: 0.2
      platform: 'state'
      to_state: 'on'
    - entity_id: 'binary_sensor.sandy_far_range'
      prob_given_true: 0.7
      prob_given_false: 0.2
      platform: 'state'
      to_state: 'on'
    - entity_id: 'binary_sensor.kirby_extended_range'
      prob_given_true: 0.9
      prob_given_false: 0.1
      platform: 'state'
      to_state: 'on'
    - entity_id: 'binary_sensor.sandy_extended_range'
      prob_given_true: 0.9
      prob_given_false: 0.1
      platform: 'state'
      to_state: 'on'
    - entity_id: 'binary_sensor.staying_at_hotel'
      prob_given_true: 0.8
      prob_given_false: 0.2
      platform: 'state'
      to_state: 'on'
    - entity_id: 'binary_sensor.no_house_motion_long'
      prob_given_true: 0.95
      prob_given_false: 0.1
      platform: 'state'
      to_state: 'on'
    - entity_id: 'input_boolean.bay_enxtended_away_override_to_true'
      prob_given_true: 1.0
      prob_given_false: 0.0
      platform: 'state'
      to_state: 'on'
    - entity_id: 'input_boolean.bay_enxtended_away_override_to_false'
      prob_given_true: 0.0
      prob_given_false: 1.0
      platform: 'state'
      to_state: 'on'

Each observation has at least five parts to it:

  • entity_id is the name of the entity that is being monitored (observed)
  • prob_given_true (this is a probability that the Bayesian sensor should turn on if the entity is ON (depending on the to_state)
  • prob_given_false (this is a probability that the Bayesian sensor should NOT turn on if the entity is OFF (depending on the to_state)
  • platform you looking at (in the case of other binary sensors and input booleans, this will almost always be ‘state.’
  • to_state - the state that the entity sensor goes to and uses prob_given_true to calculate the overall probability. For example, from above, if binary_sensor.no_house_motion_long turns ON, then the probability is calculated with the value in prob_given_true (0.95)

That’s all there is to setting up the observations.

Of course, we haven’t started the hard part yet - figuring out the correct values for all the prob_given_true, prob_given_false, prior, and probability_threshold parameters! Let’s tackle that issue. Let’s look first at the two kirby range observations - one is 50-100 miles away (binary_sensor.kirby_far_range) and the other is more than 100 miles away from home (binary_sensor.kirby_extended_range):

    - entity_id: 'binary_sensor.kirby_far_range'
      prob_given_true: 0.7
      prob_given_false: 0.2
      platform: 'state'
      to_state: 'on'
    - entity_id: 'binary_sensor.kirby_extended_range'
      prob_given_true: 0.9
      prob_given_false: 0.1
      platform: 'state'
      to_state: 'on'

If I’m 50-100 miles away (kirby_far_range), then I might be gone for the night. Or maybe I just visited someone and I’ll be back that night. If I am 50-100 miles away, I decided there was a 70% chance I might be gone for the night, so I set prob_given_true: 0.7. I set prob_given_false: 0.2 - why? I’ll explain how I came to those probs in a moment. If I’m more than 100 miles from home (kirby_extended_range), then I decided there was higher chance that I wasn’t coming home for the night. The probabilities I decided on were 0.9 for true and 0.1 for false.

NOTE: You can eliminate the prob_given_false parameter (and I probably should). If you do, the Bayesian component module calculates prob_given_false as:

prob_given_false  = 1.0 - prob_given_true

Similarly, I set the other parameters as shown in the config - again, don’t stress too much about what these values are. We’ll make this easier towards to end of this post.

With no other tools at our disposal, we now save the configuration file, restart HA, and then wait until some of the observations are met to see if we get the expected results. I didn’t want to wait until I was 100 miles away to see if the Bayesian sensor would trigger, but I didn’t know what else to do. So we went on a roadtrip one sunny afternoon. It didn’t work as expected (as an afterthought, I realized I could have set up dummy input booleans to trigger the observed binary sensors, but I hadn’t thought of that yet - and that’s still kind of a pain). Now I had to adjust the tuning parameters (probabilities and prior) and wait to see if those gave the expected results. This takes a lot of time to validate the Bayesian sensor model. There has to be a better way!

And there is. I dug into the Bayesian sensor module, found the calculation, and built a simple Excel spreadsheet to model the behavior. A screenshot of the spreadsheet is:


How this works:

  • The yellow highlighted cells are values you have to set in the bayesian configuration;
  • The row that starts with “Prior” is only relevant for the highlighted number - 0.400. That’s the prior: 0.4 line in the config. The rest of the numbers in that row are calculated values (shown below);
  • The ON/OFF row is used to simulate the conditions in the sensor. For example, if I want to see what would happen if we’re both 50-100 miles away, then I would set those two columns to 1 (the condition is TRUE), otherwise, it’s 0 (condition is FALSE). They are shown as 1 in the screenshot;
  • The rows with prob_true and prob_false are those tuning parameters that you have to provide numbers for in the config file (prob_given_true and prob_given_false);
  • The row Bayes’ Calc is calculated numbers (shown below)
  • Current Prob is the value calculated by the Bayesian component module (and shown below in Excel format) that is compared to the threshold value that you set in the config file (for this example, that’s 0.98). If this calculated number exceeds the threshold value, then the Bayesian binary sensor triggers to ON.

Notice the last two columns that start with Override. I use these in all my Bayesian sensors. The overrides are simply input_boolean toggles that I can use to force the bayesian sensor to ON or OFF. This happens by setting up the probabilities as shown in the Excel and below to force the Bayesian sensor to ON:

      prob_given_true: 1.0
      prob_given_false: 0.0

Why do this? It’s a backdoor into the Bayesian calculation. For example, if we’re in a hotel 20 miles away, the bayesian sensor won’t trip to ON, but I can turn on the input_boolean and it will force the probability to 1.0 (this is an artifact of the math behind the sensor calculation), which does trip the Bayesian sensor to ON. Similarly I can use the OFF input_boolean to turn it OFF if for some reason it’s ON when it shouldn’t be by setting the probabilities to:

      prob_given_true: 0.0
      prob_given_false: 1.0

I attempted to upload this relatively simple spreadsheet, but the system doesn’t allow that. So the following three views show the calculations in the cells:

Once you have the first two columns (conditions) built, then it’s a matter of copying the second column one or more times to create the rest of the sensor conditions.

The last screen shot shows the override to ON condition:

The scenario that is modeled in the spreadsheet is if we were staying in a hotel (from my Google calendar) that was less than 50 miles from home and there hadn’t been any motion in the house for over 10 hours. The calculated prob is 0.962 which is less than the threshold value of 0.98. So I know I’d have to override it to ON in this particular case.

Now that I have an Excel model, I can tune the probabilities and prior without having to restart HA and waiting for the conditions to occur. I iterate through the possibilities (simulating different observations are TRUE or FALSE by simply changing that row to 1s and 0s). If I don’t hit the threshold when I think it should, I adjust the prob_given_true values (and maybe the prob_given_false values). Once I’m happy with when the bayesian sensor trips to ON based on the numerous different combinations of observations, I load the values into the config file and restart HA once. This has saved me a ton of time. And the overrides give me a backdoor if the tuned probs aren’t quite giving me the expected outcome for a scenario I didn’t plan for.

If someone can tell me how to share the spreadsheet on the forum, I’m more than happy to do that!

And of course, looking forward to comments on how others use the Bayesian sensor platform. I also use it to decide if my HVAC should be in cooling or heating mode, and whether or not we’re sleeping. I’m planning another one to decide what the thermostats should be set up based on presence, holidays, work days, getting ready for work in the morning, etc.

20 Likes

Wozer !
Spreadsheets and everything ! That’s some deep dive on the topic and if I had something more complex that trying to null out a failed this would be a step by step to run through.

Coughs sticky or guide?

1 Like

I would suggest that you make application to rewrite the documentation.

1 Like

Thanks - I struggled with this for a long time. Most likely spent way too much time on this!

I’m not sure what that means (?).

Thank you - how would I get started on that? I haven’t done that before…

1 Like

He’s suggesting you have it as a guide (get admin to change the name)
Or have changed to a sticky (not gonna happen, there’s only one sticky, Tinkerers.

Change documentation, Post a request on git hub under the documentation, listing how you think it should be. Post with the idea that this is the documentation, that way very little editing will be required.

Found a post by Tink on how it’s done : -
See Climate Control Template
Just for the method as an example

Thanks! I figured it out. I’m assuming the easiest way to do this is to copy the raw markdown to a local file, edit it, and then upload it as a suggested change, as opposed to trying to edit in the GitHub editor. Does that make sense?

Again, appreciate the help.

It does to me.
If there’s a problem, I hope they comeback and tell you why as it would be good to have some guidelines.
To be honest though I’d take a copy of what’s there already and merge in your document to fill in the gaps. I don’t know though as I will be following this case in the hopes of learning more !
Good Luck

Got it - that’s exactly what I’ll do.

Maybe you can save it as an “Strict Open XML Spreadsheet” and save it on pastebin?

You deserve a sticky thanks for the information! For the spreadsheet perhaps upload it to google sheets and make it read only so people can look at it and perhaps download it if you allow them to? In the mean time would love to have a download link in my PM :slight_smile:

A truly awe inspiring amount of work. Well done. You might be interested in the smart_hass tools by the author of the original bayesian sensor component for HA:

It will let you introspect your bayesian sensors easily as well.

Excellent idea! I’ll try to get that done today and then post the link here.

Okay, now that’s pretty cool! Thanks!!

Just a friendly link to a post I just wrote about the Bayes Sensors. I explained everything from the math, stats, and programming perspectives. I also have a working Google Sheet that you can copy for playing with and adjust the settings.

Check it out here: How Bayes Sensors work, from a Statistics Professor (with working Google Sheets!)

1 Like

Very nice explanation! Thanks for sharing. Made me realize I had forgotten about the option of the to_state being false.

I have recreated your spreadsheet and uploaded it to Google Sheets.

Bayesian Excel Sheet

3 Likes