Unavailable / Unknown Entity Monitoring - Template Sensor

I have taken the work by jazzyisj and messed about with it a bit.

I wanted to be able to include ‘child’ entities without having to hard code them and I took an approach which works but I suspect now relies on a bug in the group services.

The way I do it is to include in the ‘Ignored Entities’ group a dummy ‘pattern’ entity such as:

pattern.heating

So that all entities containing the string ‘heating’ get added to the Child Group by an automation
like this:

      - service: group.set
        data:
          object_id: unavailable_entities_children
          entities: >

            {% set ns = namespace(child_entities = '') %}
            
            {% set ignored = state_attr('group.unavailable_entities_ignored', 'entity_id') | list %}
            {%- for item in ignored %}
              {% if item.split('.')[0] == 'pattern' %}
                {%- set child_pattern = item.split('.')[1] %}
                {%- set ns.child_entities = ns.child_entities ~ states | selectattr('entity_id', 'search', child_pattern) | map(attribute='entity_id') | join(', ') %}
                {%- if not loop.last %}
                  {%- set ns.child_entities = ns.child_entities ~ ', ' %}
                {%- endif %}
              {% endif %}
            {% endfor %}
            
            {{ ns.child_entities }}

So far so good, but then I wanted to be able to dynamically add entities to the 'Ignored Entites` group based on the state of another entity.

For example, if I had a master control input_boolean for my irrigation and that boolean is turned to ‘off’ then all entities with the string ‘irrigation’ should be added to the ‘Ignored Entities’ group. Again, so far so good, I do that this way:

            - service: group.set
              data:
                object_id: unavailable_entities_ignored
                add_entities: >
                  {% if is_state('input_boolean.irrigation_master_control_switch', 'off') and 
                        'esphome_irrigation_controller' not in state_attr('group.unavailable_entities_ignored', 'entity_id') %}
                      {{ states | selectattr('entity_id', 'search', 'esphome_irrigation_controller') | map(attribute='entity_id') | join(', ') }}
                  {% endif %}

If you’re still with me, here comes the interesting bit.

I then wanted to remove those entities if the boolean is turned back on. The following should work (and kind of does):

  {%- set entities = state_attr('group.unavailable_entities_ignored', 'entity_id') | list %}
  {%- for entity in entities if 'esphome_irrigation_controller' not in entity %}
    {{ entity }}{%- if not loop.last%},{% endif %}
  {%- endfor %}

It returns exactly the list I am looking for.

However whilst the dummy pattern.some_string entitiy_ids are added to the group, whenever HA looks at that group it seems to throw an error for invalid entity_ids. Even though the dummy patterns which are hard coded do not cause a problem.

I’m not sure I expect anyone to actually have read this far as I realise this is all very esoteric and probably not at all easy to follow but I thought it might interest some of the usual suspects (e.g. petro, 123, Mariusthvdb and of course jazzyisj :wink: ) when it come to (more advanced?) templating.

And yeah… in all likelihood what I am doing could be done better… I am all ears if that is the case.

I think I understand what you’ve done (although maybe not why) and, for the moment, my only question is:

Does this template really need to start with all entities in your system?

{{ states | selectattr('entity_id', 'search', 'esphome_irrigation_controller') ...
   ^^^^^^
 All entities

Do you have fans, lights, locks, input_selects, automations, etc whose entity_id contains the search string?

If you don’t, consider narrowing it to just the domains that do. For example (constrained to three domains):

{{ expand(states.sensor, states.binary_sensor, states.switch) | selectattr('entity_id', 'search', 'esphome_irrigation_controller') ...

It won’t make a huge difference in performance but why do more work than necessary …

Do it doesn’t, thanks for the tip.

If you’ll indulge me…

Just to recap, there are two groups of entities to ignore if they are unavailable:

group.ignored_entities
and
group.unavailable_entities_children

Entities are hard coded into group.ignored_entities either explicitly as actual entities which exist or I can add ‘dummy’ sensors which provide a string for pattern matching e.g.

pattern.heating

This saves me having to add all the heating entities manually into the code and this works fine so long as they are hard coded so added in the group when when HA starts.

Now, if I try to change the group dynamically using a service and add a dummy pattern.some_string entity everything is ok i.e. it gets added to the group correctly. However, if HA references the group at some point it doesn’t like that the dummy entity as is not actually an entity.

I was just hoping to be able to add and remove ‘pattern’ entities to the group dynamically based on the state of another entity.

All this can be reproduced using the Dev tools Services page if one were so inclined.

It’s not a huge deal, more of an interesting anomaly which gives me a small annoyance and which I suspect is actually a bug somewhere. A bug that I am making use if.

I also suspect that what I am trying to achieve may well be able to be done a lot easier :wink:

If I understood you correctly, group.ignored_entities contains hard-coded entities plus ‘placeholder’ entities, like pattern.heating, that are used to dynamically populate a separate group called group.unavailable_entities_children. Effectively, the ‘placeholder’ supplies a string that’s used to generate a group containing entities whose object_id contains the string.

Instead of a ‘placeholder’, in a group containing hard-coded entities, why not create a script that accepts a variable that it uses to create a group?

For example, I can pass it:

heating, esphome_irrigation_controller

and it would create a group whose entities contain one of the two supplied sub-strings. If I want to remove entities containing esphome_irrigation_controller from the group, I simply pass it heating. If I want to get fancy, I can pass two variables, one containing the sub-strings and the other indicating if the sub-strings are for adding or removing matching entities.

I’ve updated the template sensor with some documentation on how to use a search filter to exclude groups of entities based on a partial string. I’m curious using the search filter would solve your use case here with a little less jinja magic.

The example on your GitHub page rejects entities containing several kinds of strings. It does so by using a separate rejectattr('entity_id', 'search', 'string goes here') for each string to be rejected.

search accepts a regex pattern thereby providing a lot of flexibility in defining the matching string(s). For example, this matches several strings:

rejectattr('entity_id', 'search', '(limes|oranges|lemons)')

If you wish, you could consolidate multiple instances of rejectattr into one by simply composing an appropriate regex pattern.

2 Likes

Awesome! That could really simplify things. I’ll add an example or two to the readme. Thanks for the tip!

2 Likes

Hi @ all and jazzyisj, I use your sensor definitions and they work great. Up to now:

For another reason I created two new entities:

  • sensor.statistics_entities_groups
  • sensor.statistics_entities_groups_percentage

When reloading groups (not template sensors which the above two are based on) and therefore my/your problems_any and problems_any_ignored sensors, this leads to the following two things:

  1. Your template sensor sensor.problems_any detects my two above listed sensors as “problematic” (unknown, unavailable or whatever) - just for a minute after the sensor is recreated
  2. The following error is stored in HA log:
Logger: homeassistant.components.template.template_entity
Source: components/template/template_entity.py:140
Integration: Template (documentation, issues)
First occurred: 23:29:12 (8 occurrences)
Last logged: 23:46:36

TemplateError('TypeError: argument of type 'NoneType' is not iterable') while processing template 'Template("{{ states|selectattr('state','in',['unavailable','unknown','none'])|rejectattr('domain','eq','group') |rejectattr('entity_id','in',state_attr('group.problems_any_ignored','entity_id'))|list|count }}")' for attribute '_attr_native_value' in entity 'sensor.problems_any'
TemplateError('TypeError: argument of type 'NoneType' is not iterable') while processing template 'Template("{{ states|selectattr('state','in',['unavailable','unknown','none'])|rejectattr('domain','eq','group') |rejectattr('entity_id','in',state_attr('group.problems_any_ignored','entity_id'))|map(attribute='entity_id')|list }}")' for attribute 'Entitäten' in entity 'sensor.problems_any'

I have no idea why and how to fix that.
:question: Is it because your template sensors want to do something because my entities are named “sensor.*group*” (not “group.something”!)?

Maybe I could simply put those two sensors on the ignore list, but it feels like this is some error in the template sensor definition so I wanted to inform you. Maybe you have an idea.

Full disclosure: this is my template sensor definition which I never changed (it simply worked):

  - platform: template
    sensors:
      problems_any:
        friendly_name: Problematische Entitäten
        unit_of_measurement: Entitäten
        icon_template: "{{ 'mdi:check-circle' if is_state('sensor.problems_any','0') else 'mdi:alert' }}"
        value_template: >
          {{ states|selectattr('state','in',['unavailable','unknown','none'])|rejectattr('domain','eq','group')
            |rejectattr('entity_id','in',state_attr('group.problems_any_ignored','entity_id'))|list|count }}
        attribute_templates:
          Entitäten: >
            {{ states|selectattr('state','in',['unavailable','unknown','none'])|rejectattr('domain','eq','group')
                |rejectattr('entity_id','in',state_attr('group.problems_any_ignored','entity_id'))|map(attribute='entity_id')|list }}

I’ve actually already a check to prevent those errors, I’ve just been testing it out on my config.

Here’s the most recent version, I’ll be updating the repository soon.
Let me know if it resolves your issue. You’ll have to change the ingored entity group name also (or change the template).

Unavailable Entities Template
template:
  - sensor:
      - name: 'Unavailable Entities'
        unique_id: unavailable_entities
        icon: "{{ 'mdi:alert-circle' if states('sensor.unavailable_entities')|int(0) > 0 else 'mdi:check-circle' }}"
        unit_of_measurement: entities
        state: >
          {% if state_attr('sensor.unavailable_entities','entities') != none %}
            {{ state_attr('sensor.unavailable_entities','entities')|count }}
          {% endif %}
        attributes:
          entities: >
            {% if state_attr('group.ignored_unavailable_entities','entity_id') != none %}
              {% set ignore_seconds = 60 %}
              {% set ignore_ts = (now().timestamp() - ignore_seconds)|as_datetime %}
              {{ states
                |rejectattr('domain','eq','group')
                |rejectattr('entity_id','in',state_attr('group.ignored_unavailable_entities','entity_id'))
                |rejectattr('last_changed','ge',ignore_ts)
                |selectattr('state','in',['unavailable','unknown','none'])|map(attribute='entity_id')|list }}
            {% endif %}
  1. I can confirm this worked (even I a) kept my old template sensor definition style and b) don’t know what change made it work, anyway), meaning: using your new definition did not trigger HA log entries when reloading groups, so that behavior is gone.

  2. Great to see you made such a huge progress (discovered your GitHub repository for this)! :+1::+1::+1:

  3. Some time ago I created a “all good!” automation which notifies me there are no unavailable entities anymore:
    Unfortunately based on the current definition this automation is also triggered (and notification sent) everytime reloading template entities (and therefore the unavailable_entities sensor) as the sensors state is 0 (no unavailable entities).
    :question: You got an idea for an improved version of that automation (trigger and/or condition definition) which avoids this?
    For example by checking

  • if the sensor has been reloaded within the last 60 seconds or
  • the sensors state is the same like 60 seconds before or
  • something like that.

The automation currently looks like (sensor.problems_any = sensor.unavailable_entities):

alias: Notify_System_Problematische Entitäten (Entwarnung)
trigger:
  - platform: state
    entity_id: sensor.problems_any
    for: '00:00:05'
    to: '0'
condition:
  - condition: template
    value_template: >-
      {{
      (states.automation.notify_system_home_assistant_start.attributes.last_triggered==None)
      or (as_timestamp(now()) -
      as_timestamp(state_attr('automation.notify_system_home_assistant_start',
      'last_triggered')) | float > 120 )}}
action:
  - service: notify.all_devices
    data:
      title: '✅ Entwarnung: Problematische Entitäten'
      message: Keine problematische(n) Entität(en) mehr erkannt. Alles i. O. soweit.
      data:
        subtitle: ''
        push:
          thread-id: system-notification-group
mode: single

Currently it only/already supports the “keep silent/don’t fire if HA has been started recently (within last 2 minutes)” situation.

don’t know what change made it work

When groups are reloaded their entity_id attribute is momentarily none (null) which meant when the filter in the sensor that iterate the group

|rejectattr('entity_id','in',state_attr('group.ignored_unavailable_entities','entity_id'))

failed because it doesn’t understand null and caused the error you saw in the logs. The error did no harm because as soon as the attribute had a value again everything was hunky dory. But the error was easily resolved by preventing the sensor from trying to render with this statement.

{% if state_attr('group.ignored_unavailable_entities','entity_id') != none %}

As for your automation - your issue made me realize I could make the sample automation just a little more robust so I’ve updated it.

Here is the sample automation redone with your requirements. This will stop it from firing on a template reload. It also delays the notification for 5 seconds and checks again to make sure there are still no unavailable entities before sending.

I also would look at the Uptime Sensor. If you have that enabled your conditions for startup could be a little more intuitive (see automation).

Obviously you’ll have to change the notifications to suit your needs.

Sample Automation
- id: unavailable_entities_notification
  alias: 'Unavailable Entities Notification'
  description: 'Create persistent notification if there are unavailable entities, dismiss if none.'
  mode: restart
  trigger:
    - platform: state
      entity_id: sensor.unavailable_entities
      to: ~
  condition:
    - condition: template
      value_template: >
        {{ is_number(trigger.from_state.state)
            and is_number(trigger.to_state.state) }}
  action:
    - choose:
        conditions:
          - condition: numeric_state
            entity_id: sensor.unavailable_entities
            below: 1

          - condition: template
            value_template: >
              {{ states('sensor.uptime')|as_datetime < now() - timedelta(minutes=2)
                  if states('sensor.uptime') != 'unknown' else false }}
        sequence:
          - delay: 5

          - condition: numeric_state
            entity_id: sensor.unavailable_entities
            below: 1

          - service: persistent_notification.create
            data:
              title: 'Unavailable Entities'
              message: 'TEST NONE!'
              notification_id: unavailable_entities
      default:
        - service: persistent_notification.create
          data:
            title: 'Unavailable Entities'
            message: >
              - {{ expand(state_attr('sensor.unavailable_entities','entities'))|map(attribute='entity_id')|join('\n- ') }}"
            notification_id: unavailable_entities

Thanks for the explanations. For the notification part:
I did not talk about the actual “Unavailable Entities Notification” but

Therefore: Can you explain which part(s) of your sample automation “fix”/work around this initial problem (see below), so I can transfer it to my “No unavailable entities anymore” automation?

this automation is also triggered (and notification sent) everytime reloading template entities (and therefore the unavailable_entities sensor)

It’s this part, right?

  condition:
    - condition: template
      value_template: >
        {{ is_number(trigger.from_state.state)
            and is_number(trigger.to_state.state) }}

Update: Tested, yes this is the section which “does the trick” so my “all clear!” automation gets triggered but is not running the action section (equals silence). Great!

I can and will strongly recommend your whole GitHub repo to every HA user I personally know. When starting from scratch it should be quite easy to just stick to the default and simply use it “as delivered”.
Me personally try to stay with my custom version (sensor name, new problems automation, all clear automation etc.).

1 Like

The package has been updated to accommodate the new button entity introduced in 2021.12 which is stateless and always has a state of unknown. The sensor will now ignore all button entities unless the have a state of unavailable.

2 Likes

Hi @jazzyisj. I saw your GitHub / the whole project made some huge steps and I want to thank you for this. In my opinition this should be some kind of “default feature” of HA, in a way it can be easily activated in the UI with some default settings.

Today I have an idea I’m thinking about for a few months.
Reason/initial issue: some entities of devices switch to “unavailable” only after a very very veeeeeeery long time (very likely a default set by that integration/firmware - in this case a ZigBee coordinator) of 24 hours. Quite inacceptable right - it takes me 24 hours until the unavailable entity monitoring notifies me about a problem which started a whole day ago. I tend to call it “flat line issue”:

That’s clearly the fault of the device integration, but we could improve the monitoring of this like this:

How about - either as a complete option or better as a separated - second sensor which works completely the same like the existing one - only the set of defined “problem states” is a bit different:

  • instead of checking for states like unavailable or unknown…
  • we check if entities did change/update their values (a real sensor value change meaning higher/lower value than before - monitoring the last update time might be not sufficient due to the fact it is being touched by a HA restart etc.) within a certain period of time.

Simple example: if my sensors XYZ did not update once within the last 3 hours, turn on the “name to be defined sensor” (I’d propose something like “entities to be checked” or sth. like that).

Needed:

  1. variable for acceptable time slot like 3 hours
  2. detection algorithm “did (not) update within [value] time”)
  3. maybe a mapping combination like “only alert if all entities belongig to one device did not provide any value update” (in the end this way we could monitor if a certain device might have a problem and we wouldn’t need to ‘monitor’ many entities. That could transform this to a “Unavailable / Unknown Device Monitoring - Template Sensor” - …which doesn’t exist at all currently, because devices can’t be directly monitored, right?!

That’s basically the same this project is doing today - but it is based on a some kind of “dynamic” problem detection and could be considered as additional or advanced approach. In the end it’s all about monitoring our systems components - entities and devices.

/Small pitch over - opinions, thoughts?

1 Like

I have created a group of my shelly switches.
Then I created an automation as follows:

- id: '1645905893866'
  alias: Shelly unavailable notification
  description: ''
  trigger:
  - platform: template
    value_template: '{{ states(''group.all_shelly_switches'') == ''unavailable'' }} '
  condition: []
  action:
  - service: notify.spitimobile_app_sm_g980f
    data:
      title: ATTENTION
      message: shelly are unavailable..again
  mode: single

I was wondering if is possible to test the template for 30 seconds?
For example if is unavailable for more than 30 seconds, then return true and then continue with action.

is it possible?

You can put a delay of 30 sec followed by a condition under the actions. The caveat is that this won’t ensure the condition was true throughout, but it should be good enough.

Template Trigger supports the for option.

  trigger:
  - platform: template
    value_template: '{{ states(''group.all_shelly_switches'') == ''unavailable'' }} '
    for:
      seconds: 30

That’s the better suggestion. I somehow thought there would be an action, then a delay, followed by another action.

This is a great idea, I would also need something like this!

1 Like

Hi @jazzyisj ,

thanks so much for your work! It is really a wonderful feature and implementation went super easy with your tutorials. It should be standard feature in home assistant.

2 Likes