RFC: Easier handling of unknown/unavailable entities in alerting (Alert2)

Hi All, I’m the Alert2 author and had some ideas for making it more convenient to handle unknown/unavailable entities in alerting. Let me know if you have thoughts on the following:

The problem

Currently, Alert2 condition templates have to handle unknown/unavailable states and so the templates with state references have to specify default values :

condition: {{ states('sensor.a')|float(3.14) }}

and maybe include logic testing availability:

condition: {{ states('sensor.a') not in ['unknown', 'unavailable' ]
              and ... }}

And maybe you have a big alert to detect which entities are unavailable (which leads to lots of false positives, at least for me):

condition: {{ states | selectattr('state','in', ['unavailable', 'unknown']) |
              map(attribute='entity_id') | list | length | bool }}

Or sometimes I get lazy and use default values to intentionally fire the alert when the underlying sensor is unavailable so I notice something needs attention:

condition: {{ states('sensor.temperature_f')|float(1000) > 90 }}

A few observations

  1. When a sensor is unknown/unavailable, it seems like often you don’t want to change the alert firing state. E.g., if an alert fires when a temperature sensor is too hot, and the sensor then becomes unavailable, you no longer know if the temp is still too hot or not. The alert itself continues to be available. It just has no new information to justify a potential state change.

  2. Often, but not always, if a sensor that feeds an alert becomes unavailable, it is a problem, separate from whatever the alert may be detecting. An interesting exception is a recent discussion, in which a user describes a pool water level sensor that is expected to usually be unavailable except periodically when it exposes the water level measurement. There I imagine you’d want a separate alert to detect if the sensor stays unavailable too long.

  3. It’d be nice to get rid of common config handling unknown/unavailable across alerts.

Proposal 1

I’m considering adding a new alert config field, condition_filter (name TBD). condition_filter specifies when to ignore any changes to or errors in the condition field. condition_filter can be:

  • A template. Alert ignores condition changes/errs while the template is true.
  • A list of entity names. Equivalent to a template testing whether each entity state is unavailable/unknown.
  • A magic variable “dynamic_detect”, which resolves to the list of whatever entities are referred to in the condition field.

condition_filter can have a default value that applies to all alerts. So you could imagine setting the default to “dynamic_detect” and overriding if necessary for specific alerts.

The benefit is you’d no longer need to put default values / unavailable-handling-logic in your alert config.

Proposal 2

To complement condition_filter, I’m also considering adding the ability to automatically create alerts to detect when any entity relied on in an alert becomes unavailable/unknown. Maybe it collects together all entities referenced in condition_filter fields and creates alerts based on them. It might use the Generator feature with some special variables defined. So you could say something like:

alert2:
  alerts:
    - domain: alert2
      generator_name: __unavailable_detector__
      generator: _all_entities_in_condition_filters__
      name: "unavailable_{{ genElem }}"
      notifier: ...
      priority: ...

One question is whether these auto-generated alerts should inherit the priority, notifier or other settings from the alert that referenced the unavailable entity. And then what should happen if an entity become unavailable and it is referenced in two alerts that specify different priorities or notifiers.

Lastly, I’ll add I think condition alerts themselves should always be “available” and in either the on or off state. I think trying to propagate an “unavailable” status to the alert itself is problematic.

I think there’s room to improve things, whether it’s these proposals or something else. cc: @cerebrate, @tman98
Thoughts?

Josh

Hi, thanks for helping me on the gibhub discussion.

For the proposal 2, here is the code I am currently using. It would be nice if you include the feature in the alert2 (then I can further simplify my code).

  - binary_sensor:
      - name: "Unavailable Entities"
        state: >-
            {% if 'entities' in this.attributes %}
                {{ this.attributes.entities | count > 0 }}
            {% else %}
                False
            {% endif %}

        attributes:
          entities: >-
            {% set skip_entity_list = expand("group.skip_availability_detection_for_entities") | map(attribute='entity_id') | list %}
            
            {% set skip_device_id_list = namespace(result=[]) %}
            {% set entity_id_for_devices = expand("group.skip_availability_detection_for_devices") | map(attribute='entity_id') | list %}
            {% for entity_id in entity_id_for_devices %}
              {% set skip_device_id_list.result = skip_device_id_list.result + [device_id(entity_id)] %}
            {% endfor %}
            
            {% set ns = namespace(result=[]) %}
            
            {% set x = states.sensor | selectattr('state','in',['unavailable','none']) | list %}
            {% for s in x %}
              {% if (s.entity_id not in skip_entity_list) and (device_id(s.entity_id) not in skip_device_id_list.result) %}
                {% set ns.result = ns.result + [ {'name':s.name, 'entity_id':s.entity_id, 'state':s.state, 'html_name':s.name + '</br>' + s.entity_id} ] %}
              {% endif %}
            {% endfor %}
            
            {% set x = states.binary_sensor | selectattr('state','in',['unavailable','none']) | list %}
            {% for s in x %}
              {% if (s.entity_id not in skip_entity_list) and (device_id(s.entity_id) not in skip_device_id_list.result) %}
                {% set ns.result = ns.result + [ {'name':s.name, 'entity_id':s.entity_id, 'state':s.state, 'html_name':s.name + '</br>' + s.entity_id} ] %}
              {% endif %}
            {% endfor %}

            {% set x = states.switch | selectattr('state','in',['unavailable','none']) | list %}
            {% for s in x %}
              {% if (s.entity_id not in skip_entity_list) and (device_id(s.entity_id) not in skip_device_id_list.result) %}
                {% set ns.result = ns.result + [ {'name':s.name, 'entity_id':s.entity_id, 'state':s.state, 'html_name':s.name + '</br>' + s.entity_id} ] %}
              {% endif %}
            {% endfor %}

            {% set x = states.light | selectattr('state','in',['unavailable','none']) | list %}
            {% for s in x %}
              {% if (s.entity_id not in skip_entity_list) and (device_id(s.entity_id) not in skip_device_id_list.result) %}
                {% set ns.result = ns.result + [ {'name':s.name, 'entity_id':s.entity_id, 'state':s.state, 'html_name':s.name + '</br>' + s.entity_id} ] %}
              {% endif %}
            {% endfor %}
            
            {{ ns.result }}