Unavailable / Unknown Entity Monitoring - Template Sensor

123 · December 26, 2021, 9:28pm

The example on your GitHub page rejects entities containing several kinds of strings. It does so by using a separate rejectattr('entity_id', 'search', 'string goes here') for each string to be rejected.

search accepts a regex pattern thereby providing a lot of flexibility in defining the matching string(s). For example, this matches several strings:

rejectattr('entity_id', 'search', '(limes|oranges|lemons)')

If you wish, you could consolidate multiple instances of rejectattr into one by simply composing an appropriate regex pattern.

jazzyisj · December 26, 2021, 9:40pm

Awesome! That could really simplify things. I’ll add an example or two to the readme. Thanks for the tip!

e-raser · January 9, 2022, 10:55pm

Hi @ all and jazzyisj, I use your sensor definitions and they work great. Up to now:

For another reason I created two new entities:

sensor.statistics_entities_groups
sensor.statistics_entities_groups_percentage

When reloading groups (not template sensors which the above two are based on) and therefore my/your problems_any and problems_any_ignored sensors, this leads to the following two things:

Your template sensor sensor.problems_any detects my two above listed sensors as “problematic” (unknown, unavailable or whatever) - just for a minute after the sensor is recreated
The following error is stored in HA log:

Logger: homeassistant.components.template.template_entity
Source: components/template/template_entity.py:140
Integration: Template (documentation, issues)
First occurred: 23:29:12 (8 occurrences)
Last logged: 23:46:36

TemplateError('TypeError: argument of type 'NoneType' is not iterable') while processing template 'Template("{{ states|selectattr('state','in',['unavailable','unknown','none'])|rejectattr('domain','eq','group') |rejectattr('entity_id','in',state_attr('group.problems_any_ignored','entity_id'))|list|count }}")' for attribute '_attr_native_value' in entity 'sensor.problems_any'
TemplateError('TypeError: argument of type 'NoneType' is not iterable') while processing template 'Template("{{ states|selectattr('state','in',['unavailable','unknown','none'])|rejectattr('domain','eq','group') |rejectattr('entity_id','in',state_attr('group.problems_any_ignored','entity_id'))|map(attribute='entity_id')|list }}")' for attribute 'Entitäten' in entity 'sensor.problems_any'

I have no idea why and how to fix that.
Is it because your template sensors want to do something because my entities are named “sensor.*group*” (not “group.something”!)?

Maybe I could simply put those two sensors on the ignore list, but it feels like this is some error in the template sensor definition so I wanted to inform you. Maybe you have an idea.

Full disclosure: this is my template sensor definition which I never changed (it simply worked):

  - platform: template
    sensors:
      problems_any:
        friendly_name: Problematische Entitäten
        unit_of_measurement: Entitäten
        icon_template: "{{ 'mdi:check-circle' if is_state('sensor.problems_any','0') else 'mdi:alert' }}"
        value_template: >
          {{ states|selectattr('state','in',['unavailable','unknown','none'])|rejectattr('domain','eq','group')
            |rejectattr('entity_id','in',state_attr('group.problems_any_ignored','entity_id'))|list|count }}
        attribute_templates:
          Entitäten: >
            {{ states|selectattr('state','in',['unavailable','unknown','none'])|rejectattr('domain','eq','group')
                |rejectattr('entity_id','in',state_attr('group.problems_any_ignored','entity_id'))|map(attribute='entity_id')|list }}

jazzyisj · January 9, 2022, 11:09pm

I’ve actually already a check to prevent those errors, I’ve just been testing it out on my config.

Here’s the most recent version, I’ll be updating the repository soon.
Let me know if it resolves your issue. You’ll have to change the ingored entity group name also (or change the template).

Unavailable Entities Template

template:
  - sensor:
      - name: 'Unavailable Entities'
        unique_id: unavailable_entities
        icon: "{{ 'mdi:alert-circle' if states('sensor.unavailable_entities')|int(0) > 0 else 'mdi:check-circle' }}"
        unit_of_measurement: entities
        state: >
          {% if state_attr('sensor.unavailable_entities','entities') != none %}
            {{ state_attr('sensor.unavailable_entities','entities')|count }}
          {% endif %}
        attributes:
          entities: >
            {% if state_attr('group.ignored_unavailable_entities','entity_id') != none %}
              {% set ignore_seconds = 60 %}
              {% set ignore_ts = (now().timestamp() - ignore_seconds)|as_datetime %}
              {{ states
                |rejectattr('domain','eq','group')
                |rejectattr('entity_id','in',state_attr('group.ignored_unavailable_entities','entity_id'))
                |rejectattr('last_changed','ge',ignore_ts)
                |selectattr('state','in',['unavailable','unknown','none'])|map(attribute='entity_id')|list }}
            {% endif %}

e-raser · January 15, 2022, 7:00pm

I can confirm this worked (even I a) kept my old template sensor definition style and b) don’t know what change made it work, anyway), meaning: using your new definition did not trigger HA log entries when reloading groups, so that behavior is gone.
Great to see you made such a huge progress (discovered your GitHub repository for this)!
Some time ago I created a “all good!” automation which notifies me there are no unavailable entities anymore:
Unfortunately based on the current definition this automation is also triggered (and notification sent) everytime reloading template entities (and therefore the unavailable_entities sensor) as the sensors state is 0 (no unavailable entities).
You got an idea for an improved version of that automation (trigger and/or condition definition) which avoids this?
For example by checking

if the sensor has been reloaded within the last 60 seconds or
the sensors state is the same like 60 seconds before or
something like that.

The automation currently looks like (sensor.problems_any = sensor.unavailable_entities):

alias: Notify_System_Problematische Entitäten (Entwarnung)
trigger:
  - platform: state
    entity_id: sensor.problems_any
    for: '00:00:05'
    to: '0'
condition:
  - condition: template
    value_template: >-
      {{
      (states.automation.notify_system_home_assistant_start.attributes.last_triggered==None)
      or (as_timestamp(now()) -
      as_timestamp(state_attr('automation.notify_system_home_assistant_start',
      'last_triggered')) | float > 120 )}}
action:
  - service: notify.all_devices
    data:
      title: '✅ Entwarnung: Problematische Entitäten'
      message: Keine problematische(n) Entität(en) mehr erkannt. Alles i. O. soweit.
      data:
        subtitle: ''
        push:
          thread-id: system-notification-group
mode: single

Currently it only/already supports the “keep silent/don’t fire if HA has been started recently (within last 2 minutes)” situation.

jazzyisj · January 16, 2022, 3:10am

don’t know what change made it work

When groups are reloaded their entity_id attribute is momentarily none (null) which meant when the filter in the sensor that iterate the group

|rejectattr('entity_id','in',state_attr('group.ignored_unavailable_entities','entity_id'))

failed because it doesn’t understand null and caused the error you saw in the logs. The error did no harm because as soon as the attribute had a value again everything was hunky dory. But the error was easily resolved by preventing the sensor from trying to render with this statement.

{% if state_attr('group.ignored_unavailable_entities','entity_id') != none %}

As for your automation - your issue made me realize I could make the sample automation just a little more robust so I’ve updated it.

Here is the sample automation redone with your requirements. This will stop it from firing on a template reload. It also delays the notification for 5 seconds and checks again to make sure there are still no unavailable entities before sending.

I also would look at the Uptime Sensor. If you have that enabled your conditions for startup could be a little more intuitive (see automation).

Obviously you’ll have to change the notifications to suit your needs.

Sample Automation

- id: unavailable_entities_notification
  alias: 'Unavailable Entities Notification'
  description: 'Create persistent notification if there are unavailable entities, dismiss if none.'
  mode: restart
  trigger:
    - platform: state
      entity_id: sensor.unavailable_entities
      to: ~
  condition:
    - condition: template
      value_template: >
        {{ is_number(trigger.from_state.state)
            and is_number(trigger.to_state.state) }}
  action:
    - choose:
        conditions:
          - condition: numeric_state
            entity_id: sensor.unavailable_entities
            below: 1

          - condition: template
            value_template: >
              {{ states('sensor.uptime')|as_datetime < now() - timedelta(minutes=2)
                  if states('sensor.uptime') != 'unknown' else false }}
        sequence:
          - delay: 5

          - condition: numeric_state
            entity_id: sensor.unavailable_entities
            below: 1

          - service: persistent_notification.create
            data:
              title: 'Unavailable Entities'
              message: 'TEST NONE!'
              notification_id: unavailable_entities
      default:
        - service: persistent_notification.create
          data:
            title: 'Unavailable Entities'
            message: >
              - {{ expand(state_attr('sensor.unavailable_entities','entities'))|map(attribute='entity_id')|join('\n- ') }}"
            notification_id: unavailable_entities

e-raser · January 16, 2022, 6:28pm

Thanks for the explanations. For the notification part:
I did not talk about the actual “Unavailable Entities Notification” but

Therefore: Can you explain which part(s) of your sample automation “fix”/work around this initial problem (see below), so I can transfer it to my “No unavailable entities anymore” automation?

this automation is also triggered (and notification sent) everytime reloading template entities (and therefore the unavailable_entities sensor)

It’s this part, right?

  condition:
    - condition: template
      value_template: >
        {{ is_number(trigger.from_state.state)
            and is_number(trigger.to_state.state) }}

Update: Tested, yes this is the section which “does the trick” so my “all clear!” automation gets triggered but is not running the action section (equals silence). Great!

I can and will strongly recommend your whole GitHub repo to every HA user I personally know. When starting from scratch it should be quite easy to just stick to the default and simply use it “as delivered”.
Me personally try to stay with my custom version (sensor name, new problems automation, all clear automation etc.).

jazzyisj · January 26, 2022, 12:30am

The package has been updated to accommodate the new button entity introduced in 2021.12 which is stateless and always has a state of unknown. The sensor will now ignore all button entities unless the have a state of unavailable.

e-raser · February 8, 2022, 10:46pm

Hi @jazzyisj. I saw your GitHub / the whole project made some huge steps and I want to thank you for this. In my opinition this should be some kind of “default feature” of HA, in a way it can be easily activated in the UI with some default settings.

Today I have an idea I’m thinking about for a few months.
Reason/initial issue: some entities of devices switch to “unavailable” only after a very very veeeeeeery long time (very likely a default set by that integration/firmware - in this case a ZigBee coordinator) of 24 hours. Quite inacceptable right - it takes me 24 hours until the unavailable entity monitoring notifies me about a problem which started a whole day ago. I tend to call it “flat line issue”:

That’s clearly the fault of the device integration, but we could improve the monitoring of this like this:

How about - either as a complete option or better as a separated - second sensor which works completely the same like the existing one - only the set of defined “problem states” is a bit different:

instead of checking for states like unavailable or unknown…
…we check if entities did change/update their values (a real sensor value change meaning higher/lower value than before - monitoring the last update time might be not sufficient due to the fact it is being touched by a HA restart etc.) within a certain period of time.

Simple example: if my sensors XYZ did not update once within the last 3 hours, turn on the “name to be defined sensor” (I’d propose something like “entities to be checked” or sth. like that).

Needed:

variable for acceptable time slot like 3 hours
detection algorithm “did (not) update within [value] time”)
maybe a mapping combination like “only alert if all entities belongig to one device did not provide any value update” (in the end this way we could monitor if a certain device might have a problem and we wouldn’t need to ‘monitor’ many entities. That could transform this to a “Unavailable / Unknown Device Monitoring - Template Sensor” - …which doesn’t exist at all currently, because devices can’t be directly monitored, right?!

That’s basically the same this project is doing today - but it is based on a some kind of “dynamic” problem detection and could be considered as additional or advanced approach. In the end it’s all about monitoring our systems components - entities and devices.

/Small pitch over - opinions, thoughts?

solomos · February 26, 2022, 8:44pm

I have created a group of my shelly switches.
Then I created an automation as follows:

- id: '1645905893866'
  alias: Shelly unavailable notification
  description: ''
  trigger:
  - platform: template
    value_template: '{{ states(''group.all_shelly_switches'') == ''unavailable'' }} '
  condition: []
  action:
  - service: notify.spitimobile_app_sm_g980f
    data:
      title: ATTENTION
      message: shelly are unavailable..again
  mode: single

I was wondering if is possible to test the template for 30 seconds?
For example if is unavailable for more than 30 seconds, then return true and then continue with action.

is it possible?

parautenbach · February 26, 2022, 9:26pm

You can put a delay of 30 sec followed by a condition under the actions. The caveat is that this won’t ensure the condition was true throughout, but it should be good enough.

123 · February 26, 2022, 10:25pm

Template Trigger supports the for option.

  trigger:
  - platform: template
    value_template: '{{ states(''group.all_shelly_switches'') == ''unavailable'' }} '
    for:
      seconds: 30

parautenbach · February 27, 2022, 6:11am

That’s the better suggestion. I somehow thought there would be an action, then a delay, followed by another action.

cwein · March 8, 2022, 10:13pm

This is a great idea, I would also need something like this!

cwein · March 8, 2022, 10:34pm

Hi @jazzyisj ,

thanks so much for your work! It is really a wonderful feature and implementation went super easy with your tutorials. It should be standard feature in home assistant.

e-raser · April 23, 2022, 3:16pm

Next to one positive user response unfortunately not much. Now also listed @ Improvement: expand or split project to an "Unavailable / Unknown *DEVICE* Monitoring - Template Sensor" · Issue #11 · jazzyisj/unavailable-entities-sensor · GitHub.

jazzyisj · April 23, 2022, 5:10pm

I replied with an example on my git. Sorry took me a while to get to it, life gets in the way

github.com/jazzyisj/unavailable-entities-sensor

Improvement: expand or split project to an "Unavailable / Unknown DEVICE Monitoring - Template Sensor"

opened 03:15PM - 23 Apr 22 UTC

bcutter

**Context (Reason/initial issue):** some entities of devices switch to “unavail…able” only after a very very veeeeeeery long time (very likely a default set by that integration/firmware - in this case a ZigBee coordinator) of 24 hours. Quite inacceptable right - it takes me 24 hours until the unavailable entity monitoring notifies me about a problem which started a whole day ago. I tend to call it “flat line issue”: ![grafik](https://user-images.githubusercontent.com/13799156/164911941-9ba17174-3aa3-47f6-a3cb-b402ed6aac70.png) That’s clearly the fault of the device integration (which will likely NOT change its behaviour, see https://forum.phoscon.de/t/inacceptable-huge-default-sensor-unavailable-time-of-24-hours/1592), but we could improve the monitoring of this like this: **How about - either as a complete option or better as a separated - second sensor which works completely the same like the existing one - only the set of defined “problem states” is a bit different:** - instead of checking for states like unavailable or unknown… - …**we check if entities did change/update their values** (a real sensor value change meaning higher/lower value than before - monitoring the last update time might be not sufficient due to the fact it is being touched by a HA restart etc.) within a certain period of time. **Simple example:** if my sensors XYZ did not update once within the last 3 hours, turn on the “name to be defined sensor” (I’d propose something like “entities to be checked” or sth. like that). **Needed:** 1. variable for acceptable time slot like 3 hours 2. detection algorithm “did (not) update within [value] time”) 3. maybe a mapping combination like “only alert if _all_ entities belongig to one device did not provide any value update” (in the end this way we could monitor if a certain _device_ might have a problem and we wouldn’t need to ‘monitor’ many entities. That could transform this to a “**Unavailable / Unknown _Device_ Monitoring - Template Sensor**” - …which doesn’t exist at all currently, because devices can’t be directly monitored, right?! That’s basically the same this project is doing today - but it is based on a some kind of “dynamic” problem detection and could be considered as additional or advanced approach. In the end it’s all about monitoring our systems components - entities and devices. _Source of this improvement/idea: https://community.home-assistant.io/t/unavailable-unknown-entity-monitoring-template-sensor/147618/215_

wtstreetglow · May 25, 2022, 4:22pm

@jazzyisj Hello Jason. Could you or someone here help me figure out how to get this working. Essentially, I copied everything from the github link at the top of this post and I don’t trust that it is working properly. The screenshot below shows what has populated on my “overview” dashboard. I don’t really know why that populated there. (I guess because it has to since that dashboard is automatic and shows every sensor that I have within home assistant)-- Is this supposed to happen? Also, after restarting my HA instance, I didn’t get a persistent notification, that either means everything is fine in my setup or it isn’t working properly. Can you give me some guidance?

One last thing I can add is from the github you posted, when I copy pasted to the yaml file within the packages directory I deleted “- binary_sensor.updater # always unknown after restart” which was under the area below because I had read somewhere else that that sensor is deprecated now.

So I guess the 2 questions I have is, why did the group populate on my overview dashboard, is this normal (screenshot below)-- I also tried to hide it on the entities page but it is read only-- I don’t know if this would’ve hidden from the automated overview dashboard but I was going to try it at least–, and secondly, is this thing working. I figured for sure something would show up as unavailable but maybe my setup is cleaner than I thought. (I am wanting to use this for leak sensors and door sensors essentially). Thank you @jazzyisj or any one that is willing to chime in and help me understand how this stuff functions!

group:
  ignored_unavailable_entities:
    entities:
      - sensor.unavailable_entities # prevent template loop warnings?
      - binary_sensor.updater # always unknown after restart

jazzyisj · May 25, 2022, 4:35pm

It shows up on your dashboard because as part of the package that new group is created. Groups are shown in the auto generated dashboard. What you see on your UI there is the ignored entities group, not the sensor.

Did you set it up as a package or manually in your config? Did you try setting it up a couple of times? I see in your devtools that the sensor was named sensor.unavailable_entities_2. Click on that entity and change the entity_id to sensor.unavailable_entities (drop the “_2”).

If it is set up correctly it should be working now but just for giggles restart. What happens?

wtstreetglow · May 25, 2022, 5:49pm

Thank you for responding Jason and thank you for helping in general.

Ahh, I follow you on the group situation. I didn’t know that groups were essentially a mandatory situation on the overview dashboard. So there’s no way on the automated dashboard to have that not show?

I set it up as a package. I did have it set up previously and it would post a notification but was only showing a single “.” in it for some reason so I ended up deleting everything and tried it all again. That’s when I decided to post here because I wasn’t trusting myself/ it. haha. Thank you for catching the naming of the entity within my entities list. I am restarting now and will report back shortly. Thank you Jason!!