How Bayes Sensors work, from a Statistics Professor (with working Google Sheets!)

teskanoo · July 29, 2022, 4:52am

I think I would like to see a bit of groundswell behind this proposal - the negated state should have consequence for the calculation. Implementing this would encourage my Bayes sensors to “turn off” / “dial down” the probability when their contributing inputs are false

HarvsG · September 5, 2022, 10:41am

I have a PR:

github.com/home-assistant/core

Fix Bayesian sensor to use negative observations

home-assistant:dev ← HarvsG:baysian

opened 12:41PM - 04 Mar 22 UTC

HarvsG

+276 -104

## Breaking change Bayesian sensor has been updated: - `prob_given_false` is now a required configuration variable as there was no mathematical rationale for the previous default value. - `numeric_state`, `template`, and `state` entries with only one `to_state` configured will also now update the prior probability accordingly if the observation is false. Those who have used duplicate, mirrored `state` configurations as a work-around will have their functionality preserved. However for `numeric_state` and `template` entries this will cause duplication of the Bayesian updating. ## Proposed change  This change will be breaking, the previous iterations of this integration only used an observation to update the prior if the observation evaluated to `True` and ignored it if the observation was `False` or `None`/`Unavailable`. Many users may have tweaked their probabilities to get the desired behaviour, or made duplicate, mirrored, entries (one for each binary sensor state - [the advocated work-around in this community post](https://community.home-assistant.io/t/how-bayes-sensors-work-from-a-statistics-professor-with-working-google-sheets/143177)) in their `bayesian` `config.yml` in order to get the desired behaviour. This fix will ~likely change the behaviour of their bayesian sensors - by duplicating the effect of each observation.~ will preserve these configurations in commit 0b366b5ddca5ed2485232f25bc51d5dc224cee36 for 'state' entries but may break configurations for 'numeric_state' and 'template' entries, to fix simply remove the mirrored entry. According to the documentation page this integration is only used by approx 600 active users. Fixes [#67478](https://github.com/home-assistant/core/issues/67478) The Bayesian sensor purports to use bayes' rule to combine sensor information in order to create a composite, probabilistic, sensor. However as currently configured it does so in a one-sided fashion that is mathematically and practically incorrect. ### A worded example: I want to know if my internet is likely to be working for the devices connected to my wifi. I have a variety of HA sensors that are correlated; can I ping google.com, can I ping 1.1.1.1, can I ping my local DNS servers, can I ping my local router. Any one of these devices can, in varying regularity, fail to respond to pings, however if a number at the same time fail to respond then the issue is *probably* my connection and not the devices/services. As currently configured Bayesian 'observations' only affect the probability if the sensor returns the configured `to_state` state and ignores any other state. However, according to bayes' rule, if I *CAN* connect to 1.1.1.1, that tells me something about my connection and if I *CANNOT* that also tells me something - both scenarios *should* 'update my priors'. If the sensor in question fails or returns `unavailable` - then fine, ignore it. <details> <summary>Click to expand - Current functionality: One-sided bayesian updating</summary> ```yml binary_sensor: - platform: ping name: Internet_Connection_Google host: 8.8.8.8 count: 3 scan_interval: 30 - platform: ping name: Internet_Connection_Cloudflared host: 1.1.1.1 count: 3 scan_interval: 30 - platform: ping name: Wireguard host: 10.10.0.1 count: 3 scan_interval: 30 - platform: ping name: router host: 192.168.0.1 count: 3 scan_interval: 30 - platform: ping name: pihole host: 192.168.0.28 count: 3 scan_interval: 30 # bayesian sensore - platform: "bayesian" name: "internet_working" # maybe sensible to switch these in seperate errors - DNS, Switch, Gl-inet prior: 0.99 # internet has high up time probability_threshold: 0.95 # by defualt assume the internet is up observations: - platform: "state" entity_id: "binary_sensor.internet_connection_cloudflared" prob_given_true: 0.997 # cloudflare is rarely unavailible when the internet is up prob_given_false: 0.01 # it is highly likely that cloudflare will be unreachable when the internet is down - unless it is a dns issue to_state: "on" - platform: "state" entity_id: "binary_sensor.internet_connection_google" prob_given_true: 0.994 prob_given_false: 0.01 to_state: "on" - platform: "state" entity_id: "binary_sensor.router" prob_given_true: 0.995 prob_given_false: 0.4 # slight majority of time internet is down router will be as well to_state: "on" - platform: "template" # evaluates to true when both the greeni and wireguard are down prob_given_true: 0.0001 # if the internet is working then it is very unlikely both are down prob_given_false: 0.10 # if the internet is not working then there is 10% chance they are both unreachable value_template: >- {% if is_state('binary_sensor.pihole', 'off') and is_state('binary_sensor.wireguard', 'off') %} true {% else %} false {% endif %} ``` Creates this when pings are returned successfully: ![image](https://user-images.githubusercontent.com/11440490/156764242-c99af5b7-a1bf-49f5-94cd-8450157eaf2b.png) And this when I pull out my WAN cable - note how the observations 'disappear': ![image](https://user-images.githubusercontent.com/11440490/156764657-08520bda-86bc-4d89-b6ec-cc8da61d9615.png) </details> <details> <summary> Click to expand - PR functionality: Two-sided bayesian updating</summary> Same yml as above ```yml binary_sensor: - platform: ping name: Internet_Connection_Google host: 8.8.8.8 count: 3 scan_interval: 30 - platform: ping name: Internet_Connection_Cloudflared host: 1.1.1.1 count: 3 scan_interval: 30 - platform: ping name: Wireguard host: 10.10.0.1 count: 3 scan_interval: 30 - platform: ping name: router host: 192.168.0.1 count: 3 scan_interval: 30 - platform: ping name: pihole host: 192.168.0.28 count: 3 scan_interval: 30 # bayesian sensore - platform: "bayesian" name: "internet_working" # maybe sensible to switch these in seperate errors - DNS, Switch, Gl-inet prior: 0.99 # internet has high up time probability_threshold: 0.95 # by defualt assume the internet is up observations: - platform: "state" entity_id: "binary_sensor.internet_connection_cloudflared" prob_given_true: 0.997 # cloudflare is rarely unavailible when the internet is up prob_given_false: 0.01 # it is highly likely that cloudflare will be unreachable when the internet is down - unless it is a dns issue to_state: "on" - platform: "state" entity_id: "binary_sensor.internet_connection_google" prob_given_true: 0.994 prob_given_false: 0.01 to_state: "on" - platform: "state" entity_id: "binary_sensor.router" prob_given_true: 0.995 prob_given_false: 0.4 # slight majority of time internet is down router will be as well to_state: "on" - platform: "template" # evaluates to true when both the greeni and wireguard are down prob_given_true: 0.0001 # if the internet is working then it is very unlikely both are down prob_given_false: 0.10 # if the internet is not working then there is 10% chance they are both unreachable value_template: >- {% if is_state('binary_sensor.pihole', 'off') and is_state('binary_sensor.wireguard', 'off') %} true {% else %} false {% endif %} ``` Creates this when pings are returned successfully: ![image](https://user-images.githubusercontent.com/11440490/156764883-e013e4bf-3261-4ea6-af7e-223e6d1bcf3c.png) And this when not: (Note how the state change is detected and the probabilities update appropriately) ![image](https://user-images.githubusercontent.com/11440490/156765269-60cbebb5-90ba-4910-af03-357f8364ac65.png) </details> ## Type of change  - [ ] Dependency upgrade - [x] Bugfix (non-breaking change which fixes an issue) - [ ] New integration (thank you!) - [ ] New feature (which adds functionality to an existing integration) - [x] Breaking change (fix/feature causing existing functionality to break) - [ ] Code quality improvements to existing code or addition of tests ## Additional information  - This PR fixes or closes issue: fixes #67478 - This PR is related to issue: - Link to documentation pull request: https://github.com/home-assistant/home-assistant.io/pull/21899 ## Checklist  - [x] The code change is tested and works locally. - [x] Local tests pass. **Your PR cannot be merged unless tests pass** - [x] There is no commented out code in this PR. - [x] I have followed the [development checklist][dev-checklist] - [x] The code has been formatted using Black (`black --fast homeassistant tests`) - [x] ~Tests have been added to verify that the new code works.~ No new tests required If user exposed functionality or configuration variables are added/changed: - [x] ~Documentation added/updated for [www.home-assistant.io][docs-repository]~ None changed If the code communicates with devices, web services, or third-party tools: - [x] The [manifest file][manifest-docs] has all fields filled out correctly. Updated and included derived files by running: `python3 -m script.hassfest`. - [x] ~New or updated dependencies have been added to `requirements_all.txt`. Updated by running `python3 -m script.gen_requirements_all`.~ No new deps - [x] ~For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.~ No change to deps - [x] ~Untested files have been added to `.coveragerc`.~ No new files The integration reached or maintains the following [Integration Quality Scale][quality-scale]:  - [x] No score or internal  To help with the load of incoming pull requests: - [x] I have reviewed two other [open pull requests][prs] in this repository. [prs]: https://github.com/home-assistant/core/pulls?q=is%3Aopen+is%3Apr+-author%3A%40me+-draft%3Atrue+-label%3Awaiting-for-upstream+sort%3Acreated-desc+review%3Anone+-status%3Afailure  [dev-checklist]: https://developers.home-assistant.io/docs/en/development_checklist.html [manifest-docs]: https://developers.home-assistant.io/docs/en/creating_integration_manifest.html [quality-scale]: https://developers.home-assistant.io/docs/en/next/integration_quality_scale_index.html [docs-repository]: https://github.com/home-assistant/home-assistant.io

I just need to improve the tests, unfortunately I am a bit pressed for time at the moment

Edit: PR is reviewer approved and awaiting merge.
Edit2: @teskanoo the PR is now merged, not sure what release it will be in. Hopefully 2022.10
Edit3: Released in 2022.10 - this is a breaking change but I also included some repairs which should detect and notify for most broken configs.

silvio · September 28, 2022, 6:54am

This explanation is top! I wish all teachers would be like this. Keep up the good work mate.

HarvsG · October 7, 2022, 12:22pm

New spreadsheet for the 2022.10 update
For generating configs (only works for entities that are binary at the moment)

sentur · November 19, 2022, 6:42pm

Thank you for creating the updated Bayesian Tester spreadsheet. It’s extremely helpful.

I’ve created residents asleep sensor in my setup which works well. It triggers on with very high accuracy. But I’m struggling to implement a way to track the sensor to turn off when residents are awake. At the moment, the sensor triggers off at the point when the TOD sensor turns off.

I have a few things that I observe on a regular basis but don’t know how to implement them?

My phone alarm or Google Lenovo clock alarm triggers once in the morning. Sometimes it snoozes. But once the alarm has been stopped I’m up and wake. Both my phone and Google clock are available in HA via the companion App and Google Assitant integrations.
there’s usually motion in the bedroom followed by the kitchen within bout 5mins of each other.
I usually play the radio in the kitchen on. google speak whilst making breakfast.

- platform: tod
  name: Night Time Sleeping Hours
  after: "20:00"
  before: "07:00"


- platform: "bayesian"
  name: "Residents Asleep"
  unique_id: "4ff91613-8a74-4500-b00d-4ce4ab85a28a"
  prior: 0.33
  probability_threshold: 0.9
  observations:
    - platform: "state"
      entity_id: media_player.living_room_tv 
      prob_given_true: 0.88
      prob_given_false: 0.69
      to_state: 'off'

    - platform: "state"
      entity_id: group.all_lights
      prob_given_true: 0.97
      prob_given_false: 0.75
      to_state: 'off'

    - platform: "state"
      entity_id: binary_sensor.house_occupied_residents
      prob_given_true: 0.99
      prob_given_false: 0.81
      to_state: 'on'

    - platform: "state"
      entity_id: binary_sensor.night_time_sleeping_hours
      prob_given_true: 0.88
      prob_given_false: 0.01
      to_state: 'on'

HarvsG · November 23, 2022, 5:46pm

For these two I would use templates to detect if you are after your alarm time (this depends what happens to the state of the alarm sensor once the alarm has finished - but you could always store than in a helper to stop that)

      value_template: >-
        {% if as_timestamp(now()) > as_timestamp(states('sensor.google_speaker_alarms')) %}
           true
        {% else %}
           false
        {% endif %}

I personally use this one quite a lot

    - platform: "template" # is harvsg home with a charging phone
      prob_given_true: 0.7 # when everyone is asleep my phone will be charging and I will be home, but 30% of the time I am away from home.
      prob_given_false: 0.1 # sometimes I do a top-up charge at home.
      value_template: >-
        {% if is_state('person.harvsg', 'home')
           and is_state('sensor.phone_charger_type', 'ac') %}
           true
        {% else %}
           false
        {% endif %}

As a gerneral rule: if you want to use instantaneous moments to affect the state of a bayesian sensor you need to find a way to make that instant moment last longer. Options include using an automation to change the state of a helper - sensor.hallway_then_kitchen_montion_helper and then another automation that resets that to off when you go to bed.
Or by using that {{as_timestamp(now()) - as_timestamp(states.sensor.hallway_motion.last_changed) < 300}} technique

sentur · December 16, 2022, 6:43pm

Does anyone use Grafana, Influxdb or history to help guide their bayesian sensor setups?

I track a number of sensors in Influxdb and Grafana. And with so much historical data at hand I wonder if there’s a way to make good use of it to inform the probability of certain observations?

I’m not sure how to go about it in an effective way.

For example, I’d like to use motion sensors in the house as an additional observation in an ’asleep’ sensor. I already have an asleep bayesian sensor which works well. But adding the motion sensors would bring another level of accuracy. Usually, there is little or no motion whilst I’m asleep.

My thinking is there away to review historical data (influx, Grafana or history), between 00:00 - 07:00 for the past 90 days. And then calculate the average number of motion events? Or some other metric that could be used to find a correlation or trend that could be turned into an observation in a bayes sensor?

Michael_Davydov · January 15, 2023, 2:55pm

I just thought about the same thing. It should be possible to get data values for any sensor to use with bayes based on history. For example a “Home Occupied” sensor. I currently have a simple input_boolean that gets triggered based on device trackers and motion trackers. So I can find correlation between this input boolean which I know to be working reliably, and any other sensor, and a script should be able to calculate what’s the more probable value for any sensor whenever the Home Occupied sensor if true or false.

Sorry, I totally butchered that description The point is, I started to write a script that can get history info from Hass. Here’s what I have so far, might be a good starting point for anyone who wants to do the same. At the moment it doesn’t do much, but it shows you how you can get historical data from Hass API. You can run it on any machine that has Python, does not have to be Hass instance. The only dependency is requests library ( pip install requests ).

It’s a low-priority project for me, so I may or may not post any updates for this.

TOKEN = "XXXXXXXXXXXXXX"
ENTITY_ID = "switch.humidifier_plug"
BAYES_REFERENCE_ENTITY_ID = "input_boolean.home_occupied"
HASS_API_URL = "http://192.168.1.20:8123/api"

import requests
from datetime import datetime, timedelta

def last_day_of_month(any_day):
    next_month = any_day.replace(day=28) + timedelta(days=4)
    return next_month - timedelta(days=next_month.day)


def hass_date_to_datetime(s):
    try:
        return datetime.strptime(s, r"%Y-%m-%dT%H:%M:%S.%f+00:00")
    except:
        return datetime.strptime(s, r"%Y-%m-%dT%H:%M:%S+00:00")

dt_fmt = r"%Y-%m-%d-%H-%M-%S-%f"

month_start = datetime.now()
month_start = datetime(year=month_start.year, month=month_start.month, day=1)
month_end = last_day_of_month(month_start)



headers = {'Authorization': f'Bearer {TOKEN}',
           'Content-Type': 'application/json'}

url = "{HASS_API_URL}/history/period/"


reference_bayes_states = requests.get(url + f"{month_start.year}-{month_start.month}-1T00:00:00+00:00?end_time={month_end.year}-{month_end.month}-{month_end.day}T00%3A00%3A00%2B00%3A00&filter_entity_id={ENTITY_ID}",
                        headers=headers)

target_entity_states = requests.get(url + f"{month_start.year}-{month_start.month}-1T00:00:00+00:00?end_time={month_end.year}-{month_end.month}-{month_end.day}T00%3A00%3A00%2B00%3A00&filter_entity_id={ENTITY_ID}",
                        headers=headers)


for state in reference_bayes_states.json()[0]:
    if state['state'] != "unknown":
        print(state['state'])

for state in target_entity_states.json()[0]:
    if state['state'] != "unknown":
        print(state['state'])

Michael_Davydov · January 15, 2023, 3:53pm

Ok, took less time and effort than I thought. So, I think it kinda works, but I didn’t yet have time to think of a smart algorithm, so it’s just brute-forcing it’s way through states. It does 2 requests to Hass API, but then it iterates over every second between dates you specify, and it checks states of 2 entities - the target one which you want to add to bayesian sensor, and the reference one which tells it “what state should Bayes be”. The “Home Occupied” based on device_trackers from the example above.

It is SLOW but it seems to work. Working prototype first, optimization later

TOKEN = "XXXXX"
ENTITY_ID = "switch.humidifier_plug"
BAYES_REFERENCE_ENTITY_ID = "input_boolean.home_occupied"
HASS_API_URL = "http://192.168.1.20:8123/api"
START_TIME = "2023.01.15 10:00"
END_TIME = "2023.01.15 16:00"
TIMEZONE_OFFSET = 0  # Timezone offset from GMT for your local time. Positive or negative number. For example if your timezone is GMT+2 - use 2 here. If it's GMT-4 then use -4.

from datetime import datetime, timedelta
import requests

def last_day_of_month(any_day):
    next_month = any_day.replace(day=28) + timedelta(days=4)
    return next_month - timedelta(days=next_month.day)


def hass_date_to_datetime(s):
    try:
        return datetime.strptime(s, r"%Y-%m-%dT%H:%M:%S.%f+00:00") + timedelta(hours=TIMEZONE_OFFSET)
    except:
        return datetime.strptime(s, r"%Y-%m-%dT%H:%M:%S+00:00") + timedelta(hours=TIMEZONE_OFFSET)

def human_time_to_datetime(s):
    return datetime.strptime(s, r"%Y.%m.%d %H:%M")

dt_fmt = r"%Y-%m-%d-%H-%M-%S-%f"

month_start = datetime.now()
month_start = datetime(year=month_start.year, month=month_start.month, day=1)
month_end = last_day_of_month(month_start)

START_TIME = human_time_to_datetime(START_TIME)
END_TIME = human_time_to_datetime(END_TIME)

headers = {'Authorization': f'Bearer {TOKEN}',
           'Content-Type': 'application/json'}

url = f"{HASS_API_URL}/history/period/"


reference_bayes_states = requests.get(url + f"{month_start.year}-{month_start.month}-1T00:00:00+00:00?end_time={month_end.year}-{month_end.month}-{month_end.day}T00%3A00%3A00%2B00%3A00&filter_entity_id={BAYES_REFERENCE_ENTITY_ID}",
                        headers=headers).json()[0]

target_entity_states = requests.get(url + f"{month_start.year}-{month_start.month}-1T00:00:00+00:00?end_time={month_end.year}-{month_end.month}-{month_end.day}T00%3A00%3A00%2B00%3A00&filter_entity_id={ENTITY_ID}",
                        headers=headers).json()[0]


def state_at_time(states, dt):
    start_state = None
    for _state in states:
        last_changed = hass_date_to_datetime(_state['last_changed'])
        if last_changed <= dt:
            start_state = _state
            continue
        if last_changed > dt:
            if start_state == None:
                return "OUT OF RANGE"
            return start_state['state']
    return start_state['state']


# Now we can either go second-by second between some dates and check state data, bruteforcing it... or we can go over target_entity_dates and calculate ranges between these. Bruteforcing is slow but is more true and reliable

data = {}

# This is the SLOOOOOOOOOOOOOOW part
seconds = (END_TIME-START_TIME).total_seconds()
for second in range(int(seconds)):
    dt = START_TIME + timedelta(seconds=second)
    if second % 100 == 0:
        print(dt)

    target_state = state_at_time(target_entity_states, dt)
    reference_state = state_at_time(reference_bayes_states, dt)
    if reference_state not in data:
        data[reference_state] = {"seconds": 0}
    if target_state not in data[reference_state]:
        data[reference_state][target_state] = 0
    data[reference_state]['seconds'] += 1
    data[reference_state][target_state] += 1

print(data)
for reference_state, _d in data.items():
    reference_seconds = _d.pop("seconds")
    for target_state, _dd in _d.items():
        print(f"{target_state} while reference is {reference_state}: {_dd/reference_seconds}")

Example output:

{'on': {'seconds': 18143, 'off': 17077, 'on': 600, 'unavailable': 466}, 'off': {'seconds': 3457, 'off': 3457}}
off while reference is on: 0.9412445571294714
on while reference is on: 0.033070605743261865
unavailable while reference is on: 0.025684837127266713
off while reference is off: 1.0

In my case “humidifier plug” is actually a “Coffe maker plug” right now, it’s repurposed but I didn’t update it’s entity_id. So from this we can say that between START_TIME = “2023.01.15 10:00” and END_TIME = “2023.01.15 16:00”, while we were home Coffe maker was ON 0.033 of the time, and off 0.94 of the time. And when we’re not home coffe maker is 1.0 off (always off, never use it while nobody is home). Which seems to make sense. We turn it on for 15-30 minutes a day (1-2 brews, each one on a timer that turns it off after 15 minutes so that it wont evaporate all the coffee if we forget about it).

So, given this information, bayes sensor should be, prob_given_true: 0.033 and prob_given_false: 0.0

sentur · January 16, 2023, 2:42pm

This is great. Can’t wait to give it a shot and see what patterns / data insights I can discover. Thanks for sharing.

dneiss · April 11, 2024, 12:50am

TLDR : Is statistical independence not fully satisfied between all your sensors and so contributing to an overestimation of the final posterior?

I’m trying to understand this diachronic form of Bayes, which I believe is expressed in the code and then spreadsheet, where the computed posterior is then applied to the next stage as the prior vs a Venn diagram approach to this.

What I couldn’t figure out is how this would work given the Venn diagram for each sensor if the sensors produced the exact same readings (given that they have the same probabilities). In this case, in the Venn diagram, the enclosed spaces for each sensor would perfectly overlap, confirming everything that the first sensor detected, resulting in P(H | Sensor1) == P(H | Sensor2) == P(H | Sensor1 and Sensor2) for the spaces enclosed in the Venn diagram. However, the diachronic calculations would produce a higher degree of confidence in the result, yet there would be no additional information since the second sensor produced the exact same results as the first. I just couldn’t reconcile how these two different approaches could produce the same result.

I finally realized that the difference between my Venn diagram approach, assuming the second sensor produced the same results as the first (based on the exact same probabilities) was that I was ignoring statistical independence. When you assert statistical independence, then the Venn diagram approach produces the same result as the diachronic calculation approach. However, if they are not independent, then the result should be derated and might not be any better than with just the first sensor, which makes sense as you are not adding as much more information than you might realize.

I then considered, by how much does the result change as the degree of dependence changes by writing an empirical simulation and simulating a change in the correlation of the two sensors, from 100% to 0%. The output confirms that the conditional probability of the two sensors does indeed scale between what it would be with just one sensor (if they produce the exact same data) and the diachronic value computed with the 2 sensors. What was interesting (unless I screwed this up) is that it didn’t do so linearly, but as a curve - i’ll attach the picture. In my case, I was modeling with a prior of .1 and P+ as .90 and P- as .1 . In that case, one sensor should give .5 and two sensors (if independent) should give .9 and you can see from the graph, it does go between those two values, but non linearly.

This is interesting because, perhaps this is an area that is contributing to error in this approach for people? That is, the spreadsheet computes a degree of confidence yet the real life confidence might actually be lower if the two sensors don’t satisfy this kind of independence? People were complaining that they had a hard time getting this to work, and I wonder if this has been considered as an error factor?

inkblotadmirer · May 14, 2024, 2:24pm

I think this sensor may have been broken, or I’m completely missing something. (Running 2024.5.3)

My presence detector has never worked well, and I finally decided to spend some time on making it work. I happened across this thread, and duplicated the results in the initial post (many thanks for clarifying the statistics of this). I created my own sensor, and it is reporting results that shouldn’t be possible based on these calculations.

Screenshot from 2024-05-14 09-06-00

When I implement these numbers, the sensor reports a probability of 0.87 (instead of the expected 0.95). If all are 0, the sensor should report 0.9 I believe (the prior should just cascade through to the final probability as there is “no new informatinon”). But if just the “wife home” sensor goes to 0, the sensor reports 0.12 – which should be impossible.

Anyone else see behavior that doesn’t seem correct or do I have some issue that I’m just not seeing? Here is my sensor, for the record:

  - platform: bayesian
    prior: 0.9
    name: 'Family Home Bayesian'
    probability_threshold: 0.95
    observations:
      - entity_id: 'binary_sensor.me_home'
        prob_given_true: 0.99  
        prob_given_false: 0.67
        platform: 'state'
        to_state: 'on'
      - entity_id: 'binary_sensor.wife_home'
        prob_given_true: 0.99  
        prob_given_false: 0.67
        platform: 'state'
        to_state: 'on'
      - entity_id: 'binary_sensor.kid_home'
        prob_given_true: 0.95  
        prob_given_false: 0.8 
        platform: 'state'
        to_state: 'on'

Don’t read anything into the values, I am just trying to get this thing understood before actually implementing it. Right now either the sensor is broken or my ability to follow logic. How could this sensor possibly report a probability of 0.12 in any condition?

dneiss · August 24, 2024, 8:21am

Inkblotadmirer, you say “the prior should just cascade through to the final probability as there is “no new information”)”. This is something that I’m tying to reconcile as well, the treatment of the prior in the spreadsheet & the code vs how I understand a 2 feature naive bayes classifier should work. Asking ChatGPT “give me the formula for naive bayes classifier with 2 features”

Unless Im not reading that formula correctly, it doesn’t have the prior propagating through as the spreadsheet and code seem to do. Instead it just takes the one starting/initial prior and multiplies that by the product of the P(Feature(i) | hypothesis) divided by the product of all the P(Feature(i)) s, resulting in a slightly different answer.

dneiss · August 24, 2024, 8:28am

Ok, I should have taken one more step with ChatGPT, but it answered the discrepancy between the two approaches:

(Did I say how much I love ChatGPT?)

nickrout · August 24, 2024, 10:50am

Got to say, the term Bayesian will have a whole new resonance from now on!

dneiss · August 24, 2024, 10:55am

So funny. I noticed that also, per recent events. I just wonder what the back story is behind the naming of the yacht.

nickrout · August 24, 2024, 10:58am

The owner developed data analysis software.

dneiss · August 24, 2024, 2:15pm

Too bad he didn’t compute P(sink | overconfidence in the design, non diligence of the crew, freak weather). (too soon?)

dneiss · August 24, 2024, 2:27pm

So after reading up on this stuff, there appears to be two basic Bayes schemes, and as this code is implemented its called the sequential Bayes, and its results can differ from the non sequential scheme, however, its supposed to help consider for non independence of the features, which is what I was concerned with above.

nickrout · August 24, 2024, 10:09pm

Probably. My fault for mentioning it.