How Bayes Sensors work, from a Statistics Professor (with working Google Sheets!)

Oh! Time of day sensors would be really helpful here. I would love to see how to get that working with bayes. Anyone have an example?

Also, how can we handle multiple device_trackers, particularly to ignore them if they are not on the same page ?

Example : I’ve got two gps device_trackers but both can occasionnaly be thrown off… I would like the bayesian sensor to make them “cancel each other” when they disagree, and rely on other signals until they agree. Is that doable ?

Edit:
Well, I found a way to do that but the probabilities cannot be defined by hours true/false. Here is a minimum working example :

  • platform: bayesian
    name: home
    prior: 0.8
    probability_threshold: 0.8
    observations:
    • platform: state
      entity_id: device_tracker.gps1
      to_state: “home”
      prob_given_true: 0.6
      prob_given_false: 0.4
    • platform: state
      entity_id: device_tracker.gps1
      to_state: “not_home”
      prob_given_true: 0.4
      prob_given_false: 0.6
    • platform: state
      entity_id: device_tracker.gps2
      to_state: “home”
      prob_given_true: 0.6
      prob_given_false: 0.4
    • platform: state
      entity_id: device_tracker.gps2
      to_state: “not_home”
      prob_given_true: 0.4
      prob_given_false: 0.6
    • platform: state
      entity_id: light.xxx
      to_state: “on”
      prob_given_true: 0.6
      prob_given_false: 0.4
    • platform: state
      entity_id: light.xxx
      to_state: “off”
      prob_given_true: 0.4
      prob_given_false: 0.6

non-contradictory GPS takes priority over light status:
home with light → 0.93 HOME
home without light → 0.86 HOME
not home with light → 0.73 NOT_HOME
not home without light → 0.54 NOT_HOME

light status takes priority over contradictory GPS:
contradictory gps with light → 0.86 HOME
contradictory gps without light → 0.73 NOT_HOME

I’m going to try adding other observations and see if it holds

You sir are a legend and definitely a great professor, no doubt. That was very clear. I was looking for more information on how this is implemented in HA and your write-up was excellent. I am trying to set up a sensor to estimate whether all of the household is in bed or not, could you please check if the below code makes sense for that.

- platform: bayesian
  prior: 0.32
  name: 'Bedtime'
  probability_threshold: 0.9
  observations:
      - entity_id: group.alarmo_device_tracker
        prob_given_true: 0.99  # If I'm in bed then I have to be home - hence 100% probability
        prob_given_false: 0.7 # I could be home and not in bed 70% of the time during any given day
        platform: 'state'
        to_state: 'home'
      - entity_id: group.alarmo_device_tracker
        prob_given_true: 0.009 # If I'm in bed, then I can't be not home
        prob_given_false: 0.5 # I'm not in bed and not home (i.e. at work or school etc.) 
        platform: 'state'
        to_state: 'not_home'        
      - entity_id: 'sensor.sun'
        prob_given_true: 0.97
        prob_given_false: 0.33
        platform: 'state'
        to_state: 'below_horizon'
      - entity_id: 'sensor.sun'
        prob_given_true: 0.03
        prob_given_false: 0.99
        platform: 'state'
        to_state: 'above_horizon'        
      - entity_id: group.interior_lights
        prob_given_true: 0.99 # I never go to bed with lights on
        prob_given_false: 0.8 # Lights off when I'm not in bed, which is true for most of the DAY, but excluding some lights (e.g.  bathroom, garage, pantry) 
        platform: 'state'
        to_state: 'off'
      - entity_id: group.interior_lights
        prob_given_true: 0.001 # reverse of the above
        prob_given_false: 0.2 # reverse of the above
        platform: 'state'
        to_state: 'on'        
      - entity_id: binary_sensor.master_door_contact
        prob_given_true: 0.98
        prob_given_false: 0.3
        platform: 'state'
        to_state: 'off'
      - entity_id: binary_sensor.master_door_contact
        prob_given_true: 0.02
        prob_given_false: 0.7
        platform: 'state'
        to_state: 'on'        
      - entity_id: 'variable.last_motion'
        prob_given_true: 0.95
        prob_given_false: 0.15
        platform: 'state'
        to_state: 'Bedroom Zooz Sensor motion'
      - entity_id: binary_sensor.abhi_pixel_is_charging
        prob_given_true: 0.9
        prob_given_false: 0.4
        platform: 'state'
        to_state: 'on'
      - entity_id: binary_sensor.abhi_pixel_is_charging
        prob_given_true: 0.1
        prob_given_false: 0.6
        platform: 'state'
        to_state: 'off'        
      - entity_id: alarm_control_panel.alarmo
        prob_given_true: 0.99
        prob_given_false: 0.01
        platform: 'state'
        to_state: 'armed_night'
      - entity_id: alarm_control_panel.alarmo
        prob_given_true: 0.01
        prob_given_false: 0.99
        platform: 'state'
        to_state: 'disarmed'  

I have created a pull request to fix this issue: https://github.com/home-assistant/core/pull/67631

@JudoWill if you felt like reviewing my code and/or reasoning that would be amazing and I would be very greatful

1 Like

These should work

    - platform: "template" 
      prob_given_true: 0.95
      prob_given_false: 0.1
      value_template: >-
        {% if is_state('binary_sensor.device_tracker.gps1', 'home')
           and is_state('binary_sensor.device_tracker.gps1', 'home') %}
           true
        {% elif is_state('binary_sensor.device_tracker.gps1', 'not_home')
           and is_state('binary_sensor.device_tracker.gps1', 'not_home') %}
           false
        {% endif %}

And because you have to give the inverse until my PR is merged

    - platform: "template" 
      prob_given_true: 0.9
      prob_given_false: 0.05
      value_template: >-
        {% if is_state('binary_sensor.device_tracker.gps1', 'home')
           and is_state('binary_sensor.device_tracker.gps1', 'home') %}
           false
        {% elif is_state('binary_sensor.device_tracker.gps1', 'not_home')
           and is_state('binary_sensor.device_tracker.gps1', 'not_home') %}
           true
        {% endif %}

If they disagree they will evaluate to null and so should be ignored by Bayesian.

P.S your probabilities look too conservative - so I’ve tweaked them in my example. Assumes they will both accidentally read ‘home’ when you are away 5% of the time and that they will both read “not_home” when you are home 10% of the time, which is probably still too conservative

Why do the inverse situations in prob_given_false not sum to 1?
13.75/14 = 0.982?

Hey guys,

What do we do if the home occupiers don’t really have a routine?
We work from home and the house MIGHT be empty a couple hours a week

Looks good, but mathematically the inverse probabilities should sum to 1.

      - entity_id: group.alarmo_device_tracker
        prob_given_true: 0.99  # If I'm in bed then I have to be home - hence 100% probability
        prob_given_false: 0.7 # I could be home and not in bed 70% of the time during any given day
        platform: 'state'
        to_state: 'home'
      - entity_id: group.alarmo_device_tracker
        prob_given_true: 0.01 # If I'm in bed, then I can't be not home
        prob_given_false: 0.3 # I'm not in bed and not home (i.e. at work or school etc.) 
        platform: 'state'
        to_state: 'not_home' 

I’d love to see some of yours as further examples.

I’m attempting to do room occupancy based on tod, motion, room power usage and other room occupancy and power usage but feel I’m missing the mark in having it only move a few points rather than the current drastic changes I see.

that is not actually knowing nothing :slight_smile:

I think I would like to see a bit of groundswell behind this proposal - the negated state should have consequence for the calculation. Implementing this would encourage my Bayes sensors to “turn off” / “dial down” the probability when their contributing inputs are false

I have a PR:

I just need to improve the tests, unfortunately I am a bit pressed for time at the moment

Edit: PR is reviewer approved and awaiting merge.
Edit2: @teskanoo the PR is now merged, not sure what release it will be in. Hopefully 2022.10
Edit3: Released in 2022.10 - this is a breaking change but I also included some repairs which should detect and notify for most broken configs.

1 Like

This explanation is top! I wish all teachers would be like this. Keep up the good work mate.

New spreadsheet for the 2022.10 update
For generating configs (only works for entities that are binary at the moment)

2 Likes

Thank you for creating the updated Bayesian Tester spreadsheet. It’s extremely helpful.

I’ve created residents asleep sensor in my setup which works well. It triggers on with very high accuracy. But I’m struggling to implement a way to track the sensor to turn off when residents are awake. At the moment, the sensor triggers off at the point when the TOD sensor turns off.

I have a few things that I observe on a regular basis but don’t know how to implement them?

  • My phone alarm or Google Lenovo clock alarm triggers once in the morning. Sometimes it snoozes. But once the alarm has been stopped I’m up and wake. Both my phone and Google clock are available in HA via the companion App and Google Assitant integrations.
  • there’s usually motion in the bedroom followed by the kitchen within bout 5mins of each other.
  • I usually play the radio in the kitchen on. google speak whilst making breakfast.
- platform: tod
  name: Night Time Sleeping Hours
  after: "20:00"
  before: "07:00"


- platform: "bayesian"
  name: "Residents Asleep"
  unique_id: "4ff91613-8a74-4500-b00d-4ce4ab85a28a"
  prior: 0.33
  probability_threshold: 0.9
  observations:
    - platform: "state"
      entity_id: media_player.living_room_tv 
      prob_given_true: 0.88
      prob_given_false: 0.69
      to_state: 'off'

    - platform: "state"
      entity_id: group.all_lights
      prob_given_true: 0.97
      prob_given_false: 0.75
      to_state: 'off'

    - platform: "state"
      entity_id: binary_sensor.house_occupied_residents
      prob_given_true: 0.99
      prob_given_false: 0.81
      to_state: 'on'

    - platform: "state"
      entity_id: binary_sensor.night_time_sleeping_hours
      prob_given_true: 0.88
      prob_given_false: 0.01
      to_state: 'on'

For these two I would use templates to detect if you are after your alarm time (this depends what happens to the state of the alarm sensor once the alarm has finished - but you could always store than in a helper to stop that)

      value_template: >-
        {% if as_timestamp(now()) > as_timestamp(states('sensor.google_speaker_alarms')) %}
           true
        {% else %}
           false
        {% endif %}

I personally use this one quite a lot

    - platform: "template" # is harvsg home with a charging phone
      prob_given_true: 0.7 # when everyone is asleep my phone will be charging and I will be home, but 30% of the time I am away from home.
      prob_given_false: 0.1 # sometimes I do a top-up charge at home.
      value_template: >-
        {% if is_state('person.harvsg', 'home')
           and is_state('sensor.phone_charger_type', 'ac') %}
           true
        {% else %}
           false
        {% endif %}

As a gerneral rule: if you want to use instantaneous moments to affect the state of a bayesian sensor you need to find a way to make that instant moment last longer. Options include using an automation to change the state of a helper - sensor.hallway_then_kitchen_montion_helper and then another automation that resets that to off when you go to bed.
Or by using that {{as_timestamp(now()) - as_timestamp(states.sensor.hallway_motion.last_changed) < 300}} technique

Does anyone use Grafana, Influxdb or history to help guide their bayesian sensor setups?

I track a number of sensors in Influxdb and Grafana. And with so much historical data at hand I wonder if there’s a way to make good use of it to inform the probability of certain observations?

I’m not sure how to go about it in an effective way.

For example, I’d like to use motion sensors in the house as an additional observation in an ’asleep’ sensor. I already have an asleep bayesian sensor which works well. But adding the motion sensors would bring another level of accuracy. Usually, there is little or no motion whilst I’m asleep.

My thinking is there away to review historical data (influx, Grafana or history), between 00:00 - 07:00 for the past 90 days. And then calculate the average number of motion events? Or some other metric that could be used to find a correlation or trend that could be turned into an observation in a bayes sensor?

1 Like

I just thought about the same thing. It should be possible to get data values for any sensor to use with bayes based on history. For example a “Home Occupied” sensor. I currently have a simple input_boolean that gets triggered based on device trackers and motion trackers. So I can find correlation between this input boolean which I know to be working reliably, and any other sensor, and a script should be able to calculate what’s the more probable value for any sensor whenever the Home Occupied sensor if true or false.

Sorry, I totally butchered that description :smiley: The point is, I started to write a script that can get history info from Hass. Here’s what I have so far, might be a good starting point for anyone who wants to do the same. At the moment it doesn’t do much, but it shows you how you can get historical data from Hass API. You can run it on any machine that has Python, does not have to be Hass instance. The only dependency is requests library ( pip install requests ).

It’s a low-priority project for me, so I may or may not post any updates for this.

TOKEN = "XXXXXXXXXXXXXX"
ENTITY_ID = "switch.humidifier_plug"
BAYES_REFERENCE_ENTITY_ID = "input_boolean.home_occupied"
HASS_API_URL = "http://192.168.1.20:8123/api"

import requests
from datetime import datetime, timedelta

def last_day_of_month(any_day):
    next_month = any_day.replace(day=28) + timedelta(days=4)
    return next_month - timedelta(days=next_month.day)


def hass_date_to_datetime(s):
    try:
        return datetime.strptime(s, r"%Y-%m-%dT%H:%M:%S.%f+00:00")
    except:
        return datetime.strptime(s, r"%Y-%m-%dT%H:%M:%S+00:00")

dt_fmt = r"%Y-%m-%d-%H-%M-%S-%f"

month_start = datetime.now()
month_start = datetime(year=month_start.year, month=month_start.month, day=1)
month_end = last_day_of_month(month_start)



headers = {'Authorization': f'Bearer {TOKEN}',
           'Content-Type': 'application/json'}

url = "{HASS_API_URL}/history/period/"


reference_bayes_states = requests.get(url + f"{month_start.year}-{month_start.month}-1T00:00:00+00:00?end_time={month_end.year}-{month_end.month}-{month_end.day}T00%3A00%3A00%2B00%3A00&filter_entity_id={ENTITY_ID}",
                        headers=headers)

target_entity_states = requests.get(url + f"{month_start.year}-{month_start.month}-1T00:00:00+00:00?end_time={month_end.year}-{month_end.month}-{month_end.day}T00%3A00%3A00%2B00%3A00&filter_entity_id={ENTITY_ID}",
                        headers=headers)


for state in reference_bayes_states.json()[0]:
    if state['state'] != "unknown":
        print(state['state'])

for state in target_entity_states.json()[0]:
    if state['state'] != "unknown":
        print(state['state'])
1 Like

Ok, took less time and effort than I thought. So, I think it kinda works, but I didn’t yet have time to think of a smart algorithm, so it’s just brute-forcing it’s way through states. It does 2 requests to Hass API, but then it iterates over every second between dates you specify, and it checks states of 2 entities - the target one which you want to add to bayesian sensor, and the reference one which tells it “what state should Bayes be”. The “Home Occupied” based on device_trackers from the example above.

It is SLOW but it seems to work. Working prototype first, optimization later :smiley:

TOKEN = "XXXXX"
ENTITY_ID = "switch.humidifier_plug"
BAYES_REFERENCE_ENTITY_ID = "input_boolean.home_occupied"
HASS_API_URL = "http://192.168.1.20:8123/api"
START_TIME = "2023.01.15 10:00"
END_TIME = "2023.01.15 16:00"
TIMEZONE_OFFSET = 0  # Timezone offset from GMT for your local time. Positive or negative number. For example if your timezone is GMT+2 - use 2 here. If it's GMT-4 then use -4.

from datetime import datetime, timedelta
import requests

def last_day_of_month(any_day):
    next_month = any_day.replace(day=28) + timedelta(days=4)
    return next_month - timedelta(days=next_month.day)


def hass_date_to_datetime(s):
    try:
        return datetime.strptime(s, r"%Y-%m-%dT%H:%M:%S.%f+00:00") + timedelta(hours=TIMEZONE_OFFSET)
    except:
        return datetime.strptime(s, r"%Y-%m-%dT%H:%M:%S+00:00") + timedelta(hours=TIMEZONE_OFFSET)

def human_time_to_datetime(s):
    return datetime.strptime(s, r"%Y.%m.%d %H:%M")

dt_fmt = r"%Y-%m-%d-%H-%M-%S-%f"

month_start = datetime.now()
month_start = datetime(year=month_start.year, month=month_start.month, day=1)
month_end = last_day_of_month(month_start)

START_TIME = human_time_to_datetime(START_TIME)
END_TIME = human_time_to_datetime(END_TIME)

headers = {'Authorization': f'Bearer {TOKEN}',
           'Content-Type': 'application/json'}

url = f"{HASS_API_URL}/history/period/"


reference_bayes_states = requests.get(url + f"{month_start.year}-{month_start.month}-1T00:00:00+00:00?end_time={month_end.year}-{month_end.month}-{month_end.day}T00%3A00%3A00%2B00%3A00&filter_entity_id={BAYES_REFERENCE_ENTITY_ID}",
                        headers=headers).json()[0]

target_entity_states = requests.get(url + f"{month_start.year}-{month_start.month}-1T00:00:00+00:00?end_time={month_end.year}-{month_end.month}-{month_end.day}T00%3A00%3A00%2B00%3A00&filter_entity_id={ENTITY_ID}",
                        headers=headers).json()[0]


def state_at_time(states, dt):
    start_state = None
    for _state in states:
        last_changed = hass_date_to_datetime(_state['last_changed'])
        if last_changed <= dt:
            start_state = _state
            continue
        if last_changed > dt:
            if start_state == None:
                return "OUT OF RANGE"
            return start_state['state']
    return start_state['state']


# Now we can either go second-by second between some dates and check state data, bruteforcing it... or we can go over target_entity_dates and calculate ranges between these. Bruteforcing is slow but is more true and reliable

data = {}

# This is the SLOOOOOOOOOOOOOOW part
seconds = (END_TIME-START_TIME).total_seconds()
for second in range(int(seconds)):
    dt = START_TIME + timedelta(seconds=second)
    if second % 100 == 0:
        print(dt)

    target_state = state_at_time(target_entity_states, dt)
    reference_state = state_at_time(reference_bayes_states, dt)
    if reference_state not in data:
        data[reference_state] = {"seconds": 0}
    if target_state not in data[reference_state]:
        data[reference_state][target_state] = 0
    data[reference_state]['seconds'] += 1
    data[reference_state][target_state] += 1

print(data)
for reference_state, _d in data.items():
    reference_seconds = _d.pop("seconds")
    for target_state, _dd in _d.items():
        print(f"{target_state} while reference is {reference_state}: {_dd/reference_seconds}")

Example output:

{'on': {'seconds': 18143, 'off': 17077, 'on': 600, 'unavailable': 466}, 'off': {'seconds': 3457, 'off': 3457}}
off while reference is on: 0.9412445571294714
on while reference is on: 0.033070605743261865
unavailable while reference is on: 0.025684837127266713
off while reference is off: 1.0

In my case “humidifier plug” is actually a “Coffe maker plug” right now, it’s repurposed but I didn’t update it’s entity_id. So from this we can say that between START_TIME = “2023.01.15 10:00” and END_TIME = “2023.01.15 16:00”, while we were home Coffe maker was ON 0.033 of the time, and off 0.94 of the time. And when we’re not home coffe maker is 1.0 off (always off, never use it while nobody is home). Which seems to make sense. We turn it on for 15-30 minutes a day (1-2 brews, each one on a timer that turns it off after 15 minutes so that it wont evaporate all the coffee if we forget about it).

So, given this information, bayes sensor should be, prob_given_true: 0.033 and prob_given_false: 0.0

2 Likes

This is great. Can’t wait to give it a shot and see what patterns / data insights I can discover. Thanks for sharing.

1 Like