The Future of the Bayesian Integration - Thoughts wanted

HarvsG · January 10, 2023, 4:54pm

Ok, do do that the use would need to create some probability function or distribution over the possible range of values. I’m not entirely sure how that would be achieved.

If you can think of (1) some reasonably common use cases where this would have a definite benefit over buckets (2) could be done in a way that a user without a background in stats could do and (3) how to implement this in code I might consider it.

But I have to say that implementing continuous probability distributions in bayesian functions stretch the limits of my understanding of Bayesian stats.

In Bayesian - Accept more than 1 sate for numeric entities by HarvsG · Pull Request #80268 · home-assistant/core · GitHub (not yet implemented) there are a series of buckets. The prior will be updated by whichever bucket the observation is in, if there are no observations then the prior won’t be updated. So this isn’t binary.

Traxeronus · January 10, 2023, 7:18pm

From my perspective, Mr. Bayes, in his times, didn’t expect his theorem be used continuously for automations, calculating new results in milliseconds My proposal is actually like that - using his “static” formula with new input values whenever they change, even slightly, even every milli. It is not breaking his theorem IMHO.

The bucket solution is evaluating some range to true/false if I understand it well. If I’m out of that range, it is not observed at all. If I’d like to evaluate f.e. range of outdoor temperature for heating/cooling (in my country it can be from -20°C to +35°C), I’d expect linear probability change. Of course it can be realized by a lot of other ways, but I see this clean and simple. Just using a template for the threshold which returns a float between f.e. 0.3 and 0.8. Let the template output on the HA user, just accept template for threshold, given_true, given_false. Out of the range values are exceptions raising an error/unknown state.

Anyway, it is definitely up to you, you are the “pro” and author

P.S. just an example of one of my Bayes sensor using “buckets” (beside other observed sensors):

      - platform: template
        value_template: "{{ now().time() <= strptime(['0845', '0655'][states('binary_sensor.workday_today') | bool(true)], '%H%M').time() }}"
        prob_given_true: 0.960
        prob_given_false: 0.050
      - platform: template
        value_template: "{{ strptime(['0845', '0655'][states('binary_sensor.workday_today') | bool(true)], '%H%M').time() < now().time() <= strptime('2050', '%H%M').time() }}"
        prob_given_true: 0.250
        prob_given_false: 0.700
      - platform: template
        value_template: "{{ strptime('2050', '%H%M').time() < now().time() <= strptime(['2300', '2150'][states('binary_sensor.workday_tomorrow') | bool(true)], '%H%M').time() }}"
        prob_given_true: 0.600
        prob_given_false: 0.250
      - platform: template
        value_template: "{{ strptime(['2300', '2150'][states('binary_sensor.workday_tomorrow') | bool(true)], '%H%M').time() < now().time() }}"
        prob_given_true: 0.860
        prob_given_false: 0.120

This I’d like to avoid by just adjusting the threshold value.

David_Bartle · October 21, 2023, 12:11am

I was inspired by @HarvsG spreadsheet
and created a python notebook that helps jumpstart my template automatically.

Problem

My cats set my motion sensors off.

Well not all of them. A handful of my motion sensors can be placed around so that it only picks up human beings.

The trouble is,there are places that I want motion sensors not to look like eyesores (the kitchen) or creepy (in a bathroom). Low placement under cabinets is great for discrete placement, but picks up my wandering kittens.

Solution

I intuitively set up a Bayesian sensor that:

watches for well placed motion in adjacent rooms to the kitchen
watches for light patterns that I believe indicate humans and not felines

It worked ok but needed some tuning. @HarvsG spreadsheet is a lifesaver but gathering the hours per day was a chore.

Over Engineering

So I wrote this python notebook which will spit out a starting point for your own bayes sensor.

Here are the steps:

Make sure you have History Explorer installed
Create a permanent or temporary dashboard with all of the sensors you might believe could help you. Something like this (yellow highlights are my cats in the kitchen)

kitchen occupancy has cats1225×646 49.7 KB
Add one more thing to that dashboard: a “truth sensor” (or manually edit: see later) that reflect events (like “real” motion vs “kitty” motion)
Export 4-5 days of statistics in the History Explorer using its “Export as CSV”
If you don’t have a “truth sensor”, edit that csv in some tool and add a new fake named sensor, manually putting in on/off events that mark the start and end of reality. For example, I copied my motion sensor and deleted rows corresponding to “kitty motion”.
Save the Jupyter notebook code at the end here with the file extension .ipynb.
Upload this notebook to the free Jupyter Lab.
Upload the csv you exported
edit the “CONFIGURATION” section at the top
- set your csv filename
- set the name of your “truth” column in the csv
- set how many correlated features (sensor + sensor state) combinations that you would like this to discover
Run the notebook by clicking on the triple arrow in the top toolbar
See example output at the bottom

The future
It would be great if I could turn this into a plugin (it would require numpy, and some bayesian libraries like scikit running on the homeassistant computing power) which would let you:

select some schedul representing ground truth for your goal (ie. I was asleep these hours, or this is REAL motion)
select a list of sensors you think my be interesting
click ‘go’ and have it give you a suggestion.

I am not an expert in this ecosystem so this is all I’ll have for now.

Hopefully this is useful to someone.

Example Output (fully automated)

Now this output needs some hand tweaks. For example, while the kitchen motion “on” is a great indicator of motion, there is no need to ALSO have the kitchen motion “off” observation, so I’ll just delete that.

# binary_sensor.kitchen_occupancy is on 23.93% of the time
- platform: bayesian
  name: Probably binary_sensor.kitchen_occupancy # CHANGE ME
  prior: 0.24
  probability_threshold: 0.8 # EXPERIMENT WITH ME
  observations:
    # When 'binary_sensor.den_occupancy' is 'on' ...
    - platform: state
      entity_id: binary_sensor.den_occupancy
      to_state: on
      prob_given_true: 0.30 # (1.7 hours per day)
      prob_given_false: 0.18 # (3.3 hours per day)
    # When 'binary_sensor.front_yard_occupancy' is 'on' ...
    - platform: state
      entity_id: binary_sensor.front_yard_occupancy
      to_state: on
      prob_given_true: 0.20 # (1.1 hours per day)
      prob_given_false: 0.06 # (1.1 hours per day)
    # When 'binary_sensor.hall_occupancy' is 'off' ...
    - platform: state
      entity_id: binary_sensor.hall_occupancy
      to_state: off
      prob_given_true: 0.59 # (3.4 hours per day)
      prob_given_false: 0.93 # (16.9 hours per day)
    # When 'binary_sensor.hall_occupancy' is 'on' ...
    - platform: state
      entity_id: binary_sensor.hall_occupancy
      to_state: on
      prob_given_true: 0.41 # (2.4 hours per day)
      prob_given_false: 0.07 # (1.3 hours per day)
    # When 'binary_sensor.kitchen_occupancy' is 'off' ...
    - platform: state
      entity_id: binary_sensor.kitchen_occupancy
      to_state: off
      prob_given_true: 0.01 # (0.0 hours per day)
      prob_given_false: 0.99 # (18.3 hours per day)
    # When 'binary_sensor.kitchen_occupancy' is 'on' ...
    - platform: state
      entity_id: binary_sensor.kitchen_occupancy
      to_state: on
      prob_given_true: 0.99 # (5.7 hours per day)
      prob_given_false: 0.01 # (0.0 hours per day)
    # When 'binary_sensor.laundry_occupancy' is 'on' ...
    - platform: state
      entity_id: binary_sensor.laundry_occupancy
      to_state: on
      prob_given_true: 0.18 # (1.0 hours per day)
      prob_given_false: 0.06 # (1.1 hours per day)
    # When 'light.dining_room_main_lights' is 'on' ...
    - platform: state
      entity_id: light.dining_room_main_lights
      to_state: on
      prob_given_true: 0.14 # (0.8 hours per day)
      prob_given_false: 0.03 # (0.5 hours per day)

Jupyter code

{
  "metadata": {
    "language_info": {
      "codemirror_mode": {
        "name": "python",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8"
    },
    "kernelspec": {
      "name": "python",
      "display_name": "Python (Pyodide)",
      "language": "python"
    }
  },
  "nbformat_minor": 4,
  "nbformat": 4,
  "cells": [
    {
      "cell_type": "code",
      "source": "import dateutil.parser as dparser\nimport numpy as np\nimport os\nimport pandas\nfrom sklearn.feature_selection import SelectKBest\nfrom sklearn.feature_selection import chi2\nfrom sklearn.naive_bayes import MultinomialNB\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import accuracy_score",
      "metadata": {
        "trusted": true
      },
      "execution_count": 1,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": "###########################################\n# CONFIGURATION\n###########################################\nHISTORY_EXPLORER_EXPORT_FILE='entities-exported.csv'\nTRUTH_COLUMN = 'binary_sensor.kitchen_occupancy'\nINCLUDE_TRUTH_COLUMN_AS_OBSERVATION = True\nIGNORE_COLUMNS_WHEN_CHOOSING_FEATURES = [\n    'datetime',\n    'Unnamed: 0'\n]\nTOP_N_FEATURES = 8\nSAVE_DEBUG_OUTPUT = True",
      "metadata": {
        "trusted": true
      },
      "execution_count": 2,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": "###########################################\n# read history explorer export\ndf = pandas.read_csv(HISTORY_EXPLORER_EXPORT_FILE)",
      "metadata": {
        "trusted": true
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": "###########################################\n# remove junk entries\n###########################################\ndf = df[df.State != \"unavailable\"]\ndf = df[df.State != \"unknown\"]",
      "metadata": {
        "trusted": true
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": "###########################################\n# Take the sensor name and pivot it into a\n# column with it's state underneath it\n###########################################\nsensor = None\nunrolled = []\nfor idx, row in df.iterrows():\n  try:\n    dt = dparser.parse(row[0])\n    value = row[1]\n    unrolled.append([sensor, dt, value])\n  except dparser.ParserError:\n    sensor = row[0]\n    continue\n    \nsorted_df = pandas.DataFrame(unrolled, columns=['sensor', 'datetime', 'value']).sort_values(by=['datetime']).reset_index()\npivoted_df = pandas.pivot_table(sorted_df, index='datetime', columns='sensor', values='value', aggfunc='last')",
      "metadata": {
        "trusted": true
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": "###########################################\n# Sanity Check the truth column\n###########################################\nif TRUTH_COLUMN not in pivoted_df:\n    raise SystemExit(f\"TRUTH_COLUMN '{TRUTH_COLUMN}' does not exist in your CSV file. Here are the columns:{pivoted_df.columns}\")",
      "metadata": {
        "trusted": true
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": "###########################################\n# fill in the data to make sure there\n# is one data point every minute\n###########################################\nprevious_values = [np.nan for x in pivoted_df.columns]\nnew_data = []\nfor dt, row in pivoted_df.iterrows():\n    cur_and_prev = list(zip(list(row), previous_values))\n    filled_row = [x[0] if not pandas.isnull(x[0]) else x[1] for x in cur_and_prev]\n    previous_values = list(filled_row)\n    \n    new_row = [dt]\n    new_row.extend(filled_row)\n    \n    new_data.append(new_row)\n    \ncolumns = ['datetime']\ncolumns.extend(pivoted_df.columns)\nfilled_state_df = pandas.DataFrame(new_data, columns=columns)\n\n# truncate data to the minute\nfilled_state_df['datetime'] = filled_state_df['datetime'].apply(lambda x: x.floor('min'))\nfilled_state_df = filled_state_df.groupby('datetime', group_keys=False).tail(1)\n\n# fill in missing records for every minute of the day\nprevious_row = []\nprevious_dt = None\nnew_data = []\nfor idx, row in filled_state_df.iterrows():\n    dt = row[0]\n    \n    if previous_dt is not None:\n        # create missing times between last row and this one\n        next_dt = previous_dt + pandas.Timedelta(seconds=60)\n        while dt > next_dt:\n            previous_row[0] = next_dt\n            new_data.append(list(previous_row))\n            next_dt = next_dt + pandas.Timedelta(seconds=60)\n    new_data.append(list(row))\n    \n    previous_row = list(row)\n    previous_dt = dt\n    \ndf = pandas.DataFrame(new_data, columns=filled_state_df.columns)",
      "metadata": {
        "trusted": true
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": "###########################################\n# drop ignored columns\n###########################################\nif len(IGNORE_COLUMNS_WHEN_CHOOSING_FEATURES) > 0:\n    for ignore in IGNORE_COLUMNS_WHEN_CHOOSING_FEATURES:\n        if ignore in df.columns:\n            df = df.drop(ignore, axis=1)\n        else:\n            print(f\"Ignored column '{ignore}' did not exist in the data\")",
      "metadata": {
        "trusted": true
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": "###########################################\n# create columns for each unique state\n###########################################\norig_columns = df.columns\nfor col in orig_columns:\n    \n    if not INCLUDE_TRUTH_COLUMN_AS_OBSERVATION:\n        if col == TRUTH_COLUMN:\n            continue\n\n    if col.startswith(\"sensor.\"):\n        # numeric\n        continue\n        \n    unique_states = df[col].unique()\n    if len(unique_states) > 0:\n        for state in unique_states:\n            new_col = f\"{col}/{state}\"\n            df[new_col] = df[col].copy().apply(lambda x: x == state)\n        if col != TRUTH_COLUMN:\n            df.drop(col, axis=1, inplace=True)\n        \ndf.replace('off', False, inplace=True)\ndf.replace('on', True, inplace=True)",
      "metadata": {
        "trusted": true
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": "###########################################\n# create debug file\n###########################################\nif SAVE_DEBUG_OUTPUT:\n    name = \"debug.csv\"\n    if os.path.exists(name):\n      os.remove(name)\n    df.to_csv(name, sep=',')",
      "metadata": {
        "trusted": true
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": "###########################################\n# train and find features\n###########################################\nX = df.drop([TRUTH_COLUMN], axis=1)    \ny = df[TRUTH_COLUMN]\n\n# Split the data into training and testing sets\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# Feature selection using chi-squared\nchi2_selector = SelectKBest(chi2, k=TOP_N_FEATURES)\nX_train_chi2 = chi2_selector.fit_transform(X_train, y_train)\nX_test_chi2 = chi2_selector.transform(X_test)\n\n\n# Train a Naive Bayes classifier on the selected features\nclf = MultinomialNB()\nclf.fit(X_train_chi2, y_train)\n\n# Make predictions on the test set\ny_pred = clf.predict(X_test_chi2)\n\n# Calculate and print the accuracy of the classifier\naccuracy = accuracy_score(y_test, y_pred)\nprint(f\"# Accuracy with {TOP_N_FEATURES} selected features: {accuracy * 100:.2f}%\")\n\n# Get the indices of the selected features\nselected_feature_indices = chi2_selector.get_support(indices=True)\n\n# Get the names of the selected features\nselected_features = X.columns[selected_feature_indices]",
      "metadata": {
        "trusted": true
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": "###########################################\n# print useful information for you to \n# create your baysian settings\n###########################################\ntotal_hours = len(X)/60\ntarget_on_hours = len(df[df[TRUTH_COLUMN] == 1])/60\ntarget_off_hours = len(df[df[TRUTH_COLUMN] == 0])/60\nprior = target_on_hours/total_hours\n\nprint(f\"# {TRUTH_COLUMN} is on {target_on_hours/total_hours*100:.02f}% of the time\")\nprint(f\"- platform: bayesian\")\nprint(f\"  name: Probably {TRUTH_COLUMN} # CHANGE ME\")\nprint(f\"  prior: {prior:.02f}\")\nprint(f\"  probability_threshold: 0.8 # EXPERIMENT WITH ME\")\nprint(f\"  observations:\")\n\nfor feature in selected_features:\n    hours_given_true = (len(df[(df[TRUTH_COLUMN] == 1) & (df[feature] == 1)])/60)\n    hours_per_day_given_true = hours_given_true/total_hours*24\n    \n    hours_given_false = (len(df[(df[TRUTH_COLUMN] == 0) & (df[feature] == 1)])/60)\n    hours_per_day_given_false = hours_given_false/total_hours*24\n    \n    prob_given_true = hours_given_true/target_on_hours\n    prob_given_false = hours_given_false/target_off_hours\n    \n    # make sure it is never 0 or 100%\n    prob_given_true = 0.01 if prob_given_true < 0.01 else 0.99 if prob_given_true > 0.99 else prob_given_true\n    prob_given_false = 0.01 if prob_given_false < 0.01 else 0.99 if prob_given_false > 0.99 else prob_given_false\n    \n    entity_id, to_state = feature.split('/')\n    \n    if to_state is None:\n        print(f\"# skipping feature: {feature} could not be parsed\")\n        continue\n    print(f\"    # When '{entity_id}' is '{to_state}' ...\")\n    print(f\"    - platform: state\")\n    print(f\"      entity_id: {entity_id}\")\n    print(f\"      to_state: {to_state}\")\n    print(f\"      prob_given_true: {prob_given_true:.02f} # ({hours_per_day_given_true:.01f} hours per day)\")\n    print(f\"      prob_given_false: {prob_given_false:.02f} # ({hours_per_day_given_false:.01f} hours per day)\")",
      "metadata": {
        "trusted": true
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}

HarvsG · October 22, 2023, 2:21pm

This looks great, if you look at my goals for the Bayesian component, this overlaps significantly with points 3, 4 and 5. As you have shown, as long as you know the time periods of ‘ground truth’ and the sensors you think are correlated you can automate the production of the config.
Instead of building a plug-in it would be great to do this within core home-assistant’s config flow. We could even take it a step further and run a search on the history/state machine and pick/suggest sensors that are correlated with the ground truth time periods.

David_Bartle · October 23, 2023, 4:27am

That sounds awesome. I know a lot about programming but not the home assistant development ecosystem.

I will begin reading about config flow as you suggested.

Do you have a recommended list of baysian modules already that are knows to work well on homeassistant hardware (I’m using scikit and pandas)? After reading some code for custom components I see there is a space to list requirements - just curious if there were some to avoid to ensure maximum compatibility for others.

I am personally excited about having cousin “probably is” sensors for all of my physical sensors!

I’ve made a few notable changes to the python code in the last day or so already to further improve it’s output.

support for automatic numeric state experimentation. calculate the min, max, median, 25th percentile and 75th percentile… generate an above/below hypothesis for each of those
support for automatic “state has been X for over X minutes” experiments (5, 15, 30, 60, 180 for now) - generate template observations
make sure to quote values (bare on and off were doing bad things)

HarvsG · October 23, 2023, 10:02am

I’m not enormously familiar with config flow either, and particularly in order to have a nice UI we are going to want to do things like show device history graphs and have time selectors so users can input the ‘ground-truth’ in a friendly way, I am not sure how to do this, or even if it is possible. Its probably worth being familar with Options-Flow as well. This part of the docs is also helpful. I’ve asked on the discord for some idea of feasibility. Edit: I forgot I had asked this question before, apparently you can get state history in the config flow. Edit2: and here is an example of accessing hass in a config flow. And here is an example of how you can use the hass object to get state history

I don’t I’m afraid. The actual code only uses Bayes’ rule here! I think for most cases using a whole package/dependency is overkill. Perhaps it will be necessary when we want to work backwards to help the user decide which devices would be best to include in/add to the config to improve model performance as you have just shown in your automated experimentation examples - I would be very grateful for your experience and advice on this. It would be amazing to open up the config/option flow and have home assistant tell you “Adding hallway motion off for 60 mins will take your “everyone_is_asleep” sensor from 90% to 96% accuracy.”

I would love to have a look at those, do you have a git(hub) link?

Edit: Just had a thought looking at your code. I suspect we will find we cant do everything we want to in the config flow (I hope we can) but I wonder if we could fork the history-explorer card into a bayesian config generator card…

Edit: We should use the template sensor/binary sensor as a template for our config flow. And we should copy the idea of giving the option of either a binary sensor (with a threshold) or a percentage probability sensor core/homeassistant/components/template/config_flow.py at dev · home-assistant/core · GitHub

HarvsG · October 23, 2023, 10:38am

OK just ran your code and found it gives ouptuts like this

    # When 'light.study_light' is 'off' ...
    - platform: state
      entity_id: light.study_light
      to_state: off
      prob_given_true: 0.46 # (2.5 hours per day)
      prob_given_false: 0.88 # (16.5 hours per day)
    # When 'light.study_light' is 'on' ...
    - platform: state
      entity_id: light.study_light
      to_state: on
      prob_given_true: 0.54 # (2.8 hours per day)
      prob_given_false: 0.12 # (2.2 hours per day)

This is tautology

    # When 'light.study_light' is 'off' ...
    - platform: state
      entity_id: light.study_light
      to_state: off
      prob_given_true: 0.46 # (2.5 hours per day)
      prob_given_false: 0.88 # (16.5 hours per day)

Is perfectly adaquate

You don’t need to generate mirror image entries like this where prob_given_trues and prob_given_falses sum to 1 . The Bayesian integration infers it since https://github.com/home-assistant/core/pull/67631 and you should get a repair warning for using a config that does since Add to issue registry if user has mirrored entries for breaking in #67631 by HarvsG · Pull Request #79208 · home-assistant/core · GitHub

I edited your code to ‘fix’ the problem:

###########################################
# create columns for each unique state
###########################################
orig_columns = df.columns
for col in orig_columns:
    
    #if col == TRUTH_COLUMN:
    #    if not INCLUDE_TRUTH_COLUMN_AS_OBSERVATION:
    #        continue

    if col.startswith("sensor."):
        # numeric
        continue
        
    unique_states = df[col].unique()
    if len(unique_states) < 2:
        print(f"{col} does not vary so it is uninformative so drop it")
        df.drop(col, axis=1, inplace=True)
    if len(unique_states) == 2: #This is binary so we don't need duplicate work
        print(f"{col} is binary")
        new_col = f"{col}/{unique_states[0]}"
        df[new_col] = df[col].copy().apply(lambda x: x == unique_states[0])
        if col == TRUTH_COLUMN:
            TRUTH_COLUMN = new_col
        df.drop(col, axis=1, inplace=True)
    if len(unique_states) > 2:
        print(f"{col} has > 2 states")
        for state in unique_states:
            new_col = f"{col}/{state}"
            df[new_col] = df[col].copy().apply(lambda x: x == state)
        if col != TRUTH_COLUMN:
            df.drop(col, axis=1, inplace=True)

dnschnur · December 4, 2023, 1:11am

I like that you’re working on this. I think the Bayesian sensor is vastly under-appreciated. After the useless “year of voice” that we’ve had in 2023, I would love to see 2024 become a “year of automation” that focuses on things like making the Bayesian integration much easier to use, e.g. by making it UI-configurable, automatically deriving probabilities based on historical data, and better documentation.

noe · August 28, 2024, 8:11pm

I see a bayesian helper on the “add integration” menu, but it seems to do nothing?

Clicking this:

Just goes to the Helpers page. I don’t see a bayesian option anywhere in the list of available helpers, when I try to add a helper.

What am I missing?

Didgeridrew · August 28, 2024, 9:56pm

Last update they made it so all YAML-configured helpers would show up in the Helper list, even though you can’t do anything with them…

https://www.home-assistant.io/blog/2024/08/07/release-20248/#integrations-and-helpers-set-up-via-yaml-now-visible-in-the-ui

HarvsG · August 29, 2024, 2:09pm

I am working on a UI set up - it will come with time

github.com/home-assistant/core

Baysesian Config Flow

home-assistant:dev ← HarvsG:bayes-ui

opened 04:49PM - 24 Jul 24 UTC

HarvsG

+840 -9

## Breaking change  ## Proposed change  Implement a config flow for the Bayesian Helper Will likely require https://github.com/home-assistant/core/pull/117355 Since this may increase the user base I am keen to merge https://github.com/home-assistant/core/pull/119281 first as it is breaking. ## Type of change  - [ ] Dependency upgrade - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New integration (thank you!) - [x] New feature (which adds functionality to an existing integration) - [ ] Deprecation (breaking change to happen in the future) - [ ] Breaking change (fix/feature causing existing functionality to break) - [ ] Code quality improvements to existing code or addition of tests ## Additional information  - This PR fixes or closes issue: fixes # - This PR is related to issue: - Link to documentation pull request: ## Checklist  - [ ] The code change is tested and works locally. - [ ] Local tests pass. **Your PR cannot be merged unless tests pass** - [ ] There is no commented out code in this PR. - [ ] I have followed the [development checklist][dev-checklist] - [ ] I have followed the [perfect PR recommendations][perfect-pr] - [ ] The code has been formatted using Ruff (`ruff format homeassistant tests`) - [ ] Tests have been added to verify that the new code works. If user exposed functionality or configuration variables are added/changed: - [ ] Documentation added/updated for [www.home-assistant.io][docs-repository] If the code communicates with devices, web services, or third-party tools: - [ ] The [manifest file][manifest-docs] has all fields filled out correctly. Updated and included derived files by running: `python3 -m script.hassfest`. - [ ] New or updated dependencies have been added to `requirements_all.txt`. Updated by running `python3 -m script.gen_requirements_all`. - [ ] For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.  To help with the load of incoming pull requests: - [ ] I have reviewed two other [open pull requests][prs] in this repository. [prs]: https://github.com/home-assistant/core/pulls?q=is%3Aopen+is%3Apr+-author%3A%40me+-draft%3Atrue+-label%3Awaiting-for-upstream+sort%3Acreated-desc+review%3Anone+-status%3Afailure  [dev-checklist]: https://developers.home-assistant.io/docs/development_checklist/ [manifest-docs]: https://developers.home-assistant.io/docs/creating_integration_manifest/ [quality-scale]: https://developers.home-assistant.io/docs/integration_quality_scale_index/ [docs-repository]: https://github.com/home-assistant/home-assistant.io [perfect-pr]: https://developers.home-assistant.io/docs/review-process/#creating-the-perfect-pr

hastaboera · January 6, 2025, 1:58pm

In my Search around Bayesian sensor adding to person entity I bumped into this post. @HarvsG any thoughts on the possibility to make it available for the person entity hence it improves presence/person detection.

Combining bayesian with the person would make Automations (for me) less vulnerable to false positieves or device malfunctions (simply replace tracker linked to person.x instead of changing al entities in several automations.

HarvsG · January 6, 2025, 2:27pm

Ooo, think is an interesting and good point.

I’m not sure what the entity filter is on the person entity

Edit: it seems that it is filtered for device_trackers only and I’m not sure of a built-in way I could optionally turn the Bayesian sensor into a device tracker.

In the meantime there is some discussion of doing this with an automation here that you might find helpful:

It relies on a little know action device_tracker.see which allows one to manually change the state of a device_tracker.

If you find a simple, elegant and ideally blueprint able way of doing this then I could add it to the Bayesian Docs.

jackjourneyman · January 6, 2025, 3:34pm

This is quite an old thread (I see the number of active installations is down to 254 since it started ) and things have probably (heh) moved on, but…

I use the integration quite a lot (without, I confess, understanding much of the theory) and I find that one of its limitations is the fact that the HA sensors it generates are binary - which seems counter-intuitive, given you’re trying to decide whether something is probably true.

In practical terms, the probability attribute is extremely useful and the integration would benefit from making it a sensor in its own right.

For room-level presence detection, for example, I have a Bayesian entity for each room and a template which compares their probability attributes and picks the highest. This compensates for phones being left in the other room, dogs triggering motion sensors and all the other stuff that makes binary presence detection go wrong.

Just a thought…

hastaboera · January 7, 2025, 7:50pm

Not sure if it’s a good Idea when the bayesian is fed by the Same device_tracker that is going to be changed by the .see action

HarvsG · January 8, 2025, 12:23am

Yes, I agree with this. Especially as the probability could be converted into a binary sensory with the aid of a threshold helper.

However it would be a big breaking change to switch this. So I think the solution in future is going to be to create a template sensor from the probability attribute. I believe these can now be blueprinted, so my plan is to make an official blueprint and add it to the docs.
See this PR Add template blueprint for Bayesian by HarvsG · Pull Request #130376 · home-assistant/core · GitHub

I expect the user base to pick up once the Config Flow UI PR is merged

HarvsG · January 10, 2025, 6:06pm

@hastaboera No that is not a good Idea.
The idea would be to create a new device tracker and create an automation that couple the state of Bayesian sensor to the new device tracker using the .see action.

michaelwacey · January 10, 2025, 9:08pm

I really like the idea of this. But it seems to me that it needs to be a component of something else. I am a developer and have two computer Science degrees so I am OK putting it together. I would think that most HA users just want it to work. For example, I have sensors of various sorts throughout my house. I could use the Bayesian Integration to calculate the probability of my wife or me being in each room. I could then combine the to specify which room I am in.

Maybe a blueprint would work for this. I have used blueprints but not delved into creating one. I will have to investigate this.

teskanoo · January 20, 2025, 4:04am

As the template sensor now (and has for some time) supported actions, you could also use that to call device_tracker.see based on changes in the binary sensor whilst also implementing logic that mimics the ‘consider_home’ behaviour

teskanoo · January 20, 2025, 4:05am

Exactly this - I extract this with a template sensor for all my Bayesian Sensors