The Future of the Bayesian Integration - Thoughts wanted

Link to Github Pull Request on the same topic: Improve code quality of Bayesian integration by HarvsG · Pull Request #79098 · home-assistant/core · GitHub

The Bayesian integration has the power to become a new and very powerful ‘Helper’ with minimal configuration that would allow users to create Virtual Sensors that measure the unmeasurable and save significant money.

The Bayesian integration has a very small user base in Home Assistant installations, showing in only 700 installations according to analystics. However it has been called The Most Powerful Home Assistant Sensor and is advocated by some of the Home Assistant’s most influential members, e.g those at Home Assistant Podcast.

I think this ‘hidden gem’ of Home Assistant has been the preserve of power users but not lighter users for several reasons:

  • Unfamiliar name unless you know about Bayesian statistics
  • Unclear use - the name is not self explanatory
  • YAML configuration
  • Cognitive work required to set configuration variables
  • Previous Bugs

I think by addressing these issues, the Bayesian integration could become a very useful, even important, feature in Home Assistant.

  • Save users money by emulating other sensors (e.g presence detection)
  • Measure/estimate states that are very hard or impossible to directly measure (e.g actions like Sleep or Cooking)
    • In some cases this could also be solved by complex conditional logic in automations or template YAML.
  • Allows more robust and reliable measurement by combining sensors (e.g pings to google, Cloudflare and Microsoft to test internet connection is up)

Limitations:

  • Requires having enough (>1) sensors correlated with what it is you want to measure

A number of steps would need to be taken to realize it’s potential and I would suggest the following:

  1. Code review and optimisation
  2. Better testing, error tolerance
    • More testing of template entities
    • One test that combines several different state types
    • Clarify any errors caused by premature rounding/approximation - number of decimal places can matter a lot in Bayesian probability
    • Tests for new features in (1. and 3.)
    • If kept in, better test for numeric states with multiple ranges for one entity (negative obs should be ignored) but should warn if some values not included and probabilities do not sum
  3. Improve current experience
  4. UI configurations
    1. Simple config flow for binary entities (+/- UNKNOWN state - default to ignore) with appropriate verbosity
      • Probably needs to be in percentages
    2. UI for numeric and template configs
    3. Use history stats to suggest values for prob_given_trueand prob_given_false How to get history data for other sensors?
      • User would need to input a time range (using scheduling UI) when true/false and HA would spit out values (should adjust 0 and 1 to 0.999/ 0.001)
      • Would not work for attributes
      • Would be harder for numeric and template values
    4. Use history stats to suggest most informative entities
  5. Automate (4.) for users. They simply specify time period(s) that are True for the Bayesian sensor and HA does the rest.
  6. Specialised config-flows for virtual presence, sleep and other ‘popular’ Bayesian sensors.

I would be grateful for your thoughts and how this might conflict or help with how you use the Bayesian integration.

N.B Config subentries may become useful

9 Likes

None of those are an issue for me.

The main reason I don’t use it is because I have no need to. State driven automations based on the myriad of sensors I have available are sufficient. e.g. your examples:

If my alarm is off I am home. Does not get much Simpler than that. PIRs control lighting in passageways. Lux sensor controls lighting brightness in other rooms in combination with PIRs and other sensors. Like FSRs (see below). None of these are expensive.

Ahem: FSR - the best bed occupancy sensor

Cooking? Range, microwave or Oven is on (I personally don’t monitor or need this).

I have two connections, Fibre and 3G. I can tell which is active by the Public IP sensor. It would be extremely rare that both were down. If they were, there would be no way to get a notification to me.

I’m not poo-pooing your proposal. Just pointing out that there is another rather large reason why this integration is not used much. State driven automations are sufficient in nearly all cases.

1 Like

Thanks for taking the time to reply. Your HA setup seems enviable. I do concur that in an ideal world many of these problems can be solved by buying and fitting more sensors and then combining them using if-this-then-that logic in automations (you wouldn’t want the bed sensor to turn the house off if 1 person was having a nap), bed occupancy is not sleep after all.

The problem Bayesian aims to solve is to try and create an imperfect imitation of this with simpler logic and saving $$$.

What happened here, my whole room occupancy is broken. I’m one off the 700 I guess. The way it works now, is that the desired state? Otherwise if have to fix all the bayesian sensors i have in my setup. 20+

1 Like

Bayesian sensor fundamentally changed · Issue #79694 · home-assistant/core (github.com)

2022.10 breaks the bayesian sensor I have had working for 2-3 years and the breaking change notes in the 2022.10 release notes don’t really help.

My sensor was tuned with the help of a spreadsheet (link so not sure why it’s no longer working.

So I’ve had to disable my bayesian sensor for now, sadly.

Copied from the GitHub issue: Bayesian sensor fundamentally changed · Issue #79694 · home-assistant/core · GitHub

Apologies and efforts

Hi all and apologies for the inconvenience.
OP is right. The fundamental logic of the Bayesian sensor has indeed changed. I will explain the rationale and the methods to transition. But first I want to explain that I have put a great deal of effort into trying to make the transition as least painful as possible including the merge of 2 repair features which detect broken configs rather than silently breaking. Add repair for missing Bayesian `prob_given_false` by HarvsG · Pull Request #79303 · home-assistant/core · GitHub and Add to issue registry if user has mirrored entries for breaking in #67631 by HarvsG · Pull Request #79208 · home-assistant/core · GitHub. I hope that is what has lead you here. I made efforts to survey the community to understand how people worked around the old logic. I tried to consult the community on changes that are coming. And when I realised the breaking text was not helpful enough I tried to change it. I got a statistic professor - and likely author of the spread sheet you are using - to have a look at my changes: prob_given_false is soon req, show in examples by HarvsG · Pull Request #24276 · home-assistant/home-assistant.io · GitHub and the response was postive.

Rationale

The changes were made in essence because the previous functionality was incorrect, based on incorrect maths and those of you who do have working configs have them working despite the previous logic, not because of it. For more detail on this you can read the early discussion on the breaking PR https://github.com/home-assistant/core/pull/67631 but in short there were 2 issues. Firstly prob_given_false had a default value of 1 - prob_given_true this had no mathematical rationale. The second, which is most likely causing the issue you are facing, is that the logic did not update the probabilities if a sensor was not sensing the ‘to_state’, this is simply poor logic. If a motion sensor senses movement that is evidence you are home, and if it doesn’t - that is (weaker) evidence you are not home. This was pointed out in this popular community post

Fixes:

If your config looks like this - with both opposite states configured with prob_given_trues and prob_given_falses that sum to one then your config was essentially a ‘workaround’ implementing what this change did. So the fix is to simply remove one of them

  - platform: "bayesian"
    name: "internet_working" 
    prior: 0.99 # internet has high up time
    probability_threshold: 0.95 
    observations:
    - platform: "state"
      entity_id: "binary_sensor.internet_connection_cloudflared"
      prob_given_true: 0.003 # cloudflare is rarely unavailible when the internet is up
      prob_given_false: 0.99 # it is highly likely that cloudflare will be unreachable when the internet is down - unless it is a dns issue
      to_state: "off"
    - platform: "state"
      entity_id: "binary_sensor.internet_connection_cloudflared"
      prob_given_true: 0.997 # cloudflare is rarely unavailible when the internet is up
      prob_given_false: 0.01 # it is highly likely that cloudflare will be unreachable when the internet is down - unless it is a dns issue
      to_state: "on"
    - platform: "template" 
      prob_given_true: 0.0001 
      prob_given_false: 0.10 
      value_template: >-
        {% if is_state('binary_sensor.greenpi', 'off')
           and is_state('binary_sensor.wireguard', 'off') %}
           true
        {% else %}
           false
        {% endif %}
    - platform: "template" # evaluates to true when both the greeni and wireguard are down
      prob_given_true: 0.9999 # if the internet is working then it is very unlikely both are down
      prob_given_false: 0.90 # if the internet is not working then there is 10% chance they are both unreachable
      value_template: >-
        {% if is_state('binary_sensor.greenpi', 'off')
           and is_state('binary_sensor.wireguard', 'off') %}
           false
        {% else %}
           true
        {% endif %}

to de-duplicate

  - platform: "bayesian"
    name: "internet_working" # maybe sensible to switch these in seperate errors - DNS, Switch, Gl-inet
    prior: 0.99 # internet has high up time
    probability_threshold: 0.95 # by defualt assume the internet is up
    observations:
    - platform: "state"
      entity_id: "binary_sensor.internet_connection_cloudflared"
      prob_given_true: 0.003 # cloudflare is rarely unavailible when the internet is up
      prob_given_false: 0.99 # it is highly likely that cloudflare will be unreachable when the internet is down - unless it is a dns issue
      to_state: "off"
    - platform: "template" # evaluates to true when both the greeni and wireguard are down
      prob_given_true: 0.0001 # if the internet is working then it is very unlikely both are down
      prob_given_false: 0.10 # if the internet is not working then there is 10% chance they are both unreachable
      value_template: >-
        {% if is_state('binary_sensor.greenpi', 'off')
           and is_state('binary_sensor.wireguard', 'off') %}
           true
        {% else %}
           false
        {% endif %}

(This is important for value_template and numeric_state observations like the one above, otherwise the probability calculations will be duplicated. It won’t actually effect anything for state observations - for reasons I can explain later if wanted)

If your config looks like this (only one to_state configured per entity) then your config is essentially correct, however when you set it up you probably had to ‘tweak’ the probabilities to get the functionality you wanted rather than basing them on ‘true’ values. So your functionality will likely be different after today’s update.

  - platform: "bayesian"
    name: "someone home"
    prior: 0.6
    probability_threshold: 0.8
    observations:
    - platform: "state"
      entity_id: "binary_sensor.hall_motion"
      prob_given_true: 0.70
      prob_given_false: 0.001
      to_state: "on"
    - platform: "state"
      entity_id: "binary_sensor.kitchen_motion"
      prob_given_true: 0.65
      prob_given_false: 0.001
      to_state: "on"

There are 2 fixes. The first, and most future-proof way is to head to the new updated documentation and re-estimate your probabilities - you will likely find that accuracy improves. There are worked examples in the documentation and in the GitHub issue linked above.

The hacky solution (not recommended) to effectively disable the new functionality but setting the other states to being uninformative by setting the prob_given_false == prob_given_true. This will not work for numeric_state and templates

  - platform: "bayesian"
    name: "someone home"
    prior: 0.6
    probability_threshold: 0.8
    observations:
    - platform: "state"
      entity_id: "binary_sensor.hall_motion"
      prob_given_true: 0.70
      prob_given_false: 0.001
      to_state: "on"
    - platform: "state"
      entity_id: "binary_sensor.hall_motion"
      prob_given_true: 0.5
      prob_given_false: 0.5
      to_state: "off"
    - platform: "state"
      entity_id: "binary_sensor.kitchen_motion"
      prob_given_true: 0.65
      prob_given_false: 0.001
      to_state: "on"
    - platform: "state"
      entity_id: "binary_sensor.kitchen_motion"
      prob_given_true: 0.5
      prob_given_false: 0.5
      to_state: "off"

The hacky work around for template observations would be to re-write the template to return None instead in place of when it would return False - to again induce Bayesian to ignore a negative observation as it used to.

Again - apologies for the breaking change. I hope you can appreciate have made efforts to try and reduce the pain this change brings.

If you are confused by the above then please post your config here, DM* on the community forum if you want to keep it private or send me a link to a public gist and I will fix it for you. @ikbensuper @stigvi

*ideally best to do it in public so others can learn from the changes.

4 Likes

new spreadsheet for generating configs (only works for entities that are binary at the moment)

2 Likes

The Bayesian sensor could benefit from being able to set a Unique ID so we can manage some settings in the UI. Other then that, great work on the new spreadsheet, it makes configuring the Bayesian sensor a lot easier.

1 Like

Still awaiting approval. HA team are (understandably) not keen on people continuing to develop YAML configs in the age of UI.

1 Like

Are you getting on ok with the 2022.10 updates?

1 Like

I really like the journey you are on improving the Bayesian integration. I think it has a lot to offer, but as you think it’s a bit hard to use. I am currently using it to guess if anyone is showering (used to ramp up the ventilation), if I work from home (adds heat and does not turn off the light in my office even though the motion sensor does not react), turn the home into day mode and recognising if the cleaning lady is in the house (should not trip the alarm).

I find it’s fun to setup a new bayesian sensor and see how it behaves, but tuning it is a bit hard (and almost always necessary). I envision the ability to plot the posterior probability and threshold over time along with all the observations state changes. This would be an awesome feature for tuning (and understanding) the sensor. It could possibly be done in the attributes section if the info dialogue box, but in the history section would probably be even cooler. What do you think?

PS! another small thing… would it be possible to fix home assistant to not round posterior probability to 0 and 1?

1 Like

I think this is a good idea. I think the best way would be to turn each Bayesian configuration into a ‘device’ and then both the on/off state and the probability can be entities. Although I’m not sure if the HA team approve of virtual objects being devices in their own right, so we may need to find another solution, like this Add helper - Attribute as Sensor by gjohansson-ST · Pull Request #79835 · home-assistant/core · GitHub

This would be cool, but hard to achieve, I’m not sure of any integrations that allow the overlaying of events on a history plot. That being said, if you open up a Bayesian sensor then under the state history it will show a logbook of events that changed the state.

Have a go with this spreadsheet (would obviously be great to not need the spreadsheet

I’ll take a look at this and why it happens. Probably just rounding once it gets past 0.999

I created a posterior template sensor to test what I would want to show when analyzing bayesian sensors:

Template:

sensor:
  - platform: template
    sensors:
       bayesian_home_office_posterior:
       friendly_name: 'Home office posterior'
       unit_of_measurement: '%'
       value_template: "{{ state_attr('binary_sensor.bayesian_home_office', 'probability')|float * 100 }}"
2 Likes

I for one just came across the sensor… I’m digging it!

I would like to be able to put a template in the “prob_given_true:” field because I want to put an equation that relates to outside temperature. The lower the temperature, the higher the probability that my house HVAC system needs to switch to heating. I had a work around by dividing temperature ranges into groups and using the numeric state platform, but this working solution was broken by the update. The “hacky” work around ((posted in Bayesian sensor fundamentally changed#79694)) doesn’t work:

“The hacky work around for template observations would be to re-write the template to return None instead in place of when it would return False - to again induce Bayesian to ignore a negative observation as it used to.”

I tried this as below:

- platform: template # Needs to have 'None' in else clause so Bayesian ignores if not true

      value_template: > # Checks to see if outside hourly weather is greater than or equal to 72 degrees

        {% if states('sensor.test_hourly_weather') | float >= 72 %}

          True

        {%- else -%}

          None  

        {%- endif %}

      prob_given_true: .05

      prob_given_false: 0.95

But this seems to just read as False when temperature is < 72 (even though I used “None” as suggested) and rather than just be ignored, it significantly effects the Bayesian probability outcome

1 Like

FWIW, I recently got into Home Assistant and I was happy to see there’s a Bayesian thingy. I’ll be playing with it soon.

I was using the bayesian sensor for over 3 years for detecting presence at home.
I can’t get it working anymore as before 2022.10, so I created a custom_component for it.
If you wan’t to use it, add GitHub - gieljnssns/bayesian: The bayesian binary sensor for Home-Assistant to HACS and search for bayesian.

It should work again as before…

I am working on a pull request to enable exactly this functionality using the numeric state entity:
Accept more than 1 state for numeric entities in Bayesian by HarvsG · Pull Request #80268 · home-assistant/core · GitHub

This is a great sensor. I use it for presence in rooms and I’m thinking about using it for presence in the house. I see all the changes that have been made and they are great. I’m loving it.

1 Like