List devices being offline for too long (based on newly introduced last_reported state attribute)

Do you get any error message or simply an empty table?

If an error message, can you share that?

If just an empty table, did you make sure to label at least one device with the custom label ‘Availability check: daily’ (needs to be exactly like that)? Or maybe all your labelled devices already reported back within the last 24 hours? To verify this you can also adjust the cutoff time and lower it from 24 hours to maybe just one hour by adjusting the hours and minutes in this line {%- set timestampCutoff = now() - timedelta( hours = 24, minutes = 0 ) %}

Maybe that is obvious to others, but you can also paste the template in the ‘Template’ tab of the ‘Developers tools’ for faster turn-around/testing.

Agreed. Devices shouldn’t go offline. But it looks like at least for some (including me) every now and then the reality is different. And I admit that it can be related to my ‘zoo’ of different devices. Zigbee, Zwave, Wifi, Cloud integration, Homekit, local gateways, …

Just a few examples of the last years or so in my setup:

  1. My Sonoff temp sensor at times decides to require a re-pairing (for Zigbee2Mqtt there are multiple reports that certain devices / manufacturers have similar problems)
  2. My Daikin cloud integration had an API change and with this the HA integration required a re-authentication
  3. The access point used in my basement just to provide connectivity to my custom reader device for my natural gas meter had a fault and therefore the reader device (connected via Wifi) went offline
  4. My wife unknowingly unplugged a local gateway used to control our windows
  5. A Hue / Zigbee bulb was faulty and as such didn’t respond anymore
  6. The water leak sensor in the washing room was running out of battery

This all doesn’t happen on a weekly (or even monthly basis). And for devices frequently used a problem is obvious (e.g. the living room light doesn’t work anymore). But for other convenience or security devices problems might be less obvious. Or even critical in case of a water leak sensor that might only be required every other year.

To sum it up, even in a perfect functioning setup I still would like to have the re-assurance that everything is up and alive. And if not, I would like to be notified, similar to a ‘oil low’ light in my car.

2 Likes

Hello, I’m testing your card, but it’s not working.
I have 74 unavailable entities, which belong to 2 devices that are turned off, but they are not listed on the card.

Any idea?

I have to admit that I don’t really know what ‘unavailable device’ for HA means.

Maybe as a start for debugging use the template below (might be best for simplicity to paste this into the Templates tab of Developer tools). This runs through every state and lists the most recent last_reported timestamp for every device. Disclaimer: For this I really don’t know how this works for installations with many devices.

Check if you can find your unavailable devices in the output. If this doesn’t produce any result (and also doesn’t crash) there might be differences between our HA versions? I’m currently on 2024.4.2.

{# Collect all last_reported from labelled devices #}
{# hack to all growing dictionary #}
{%- set ns = namespace(d1={} ) %}
 
{# {{ relative_time(now()) }} #}
  
{# get all states #}
{%- for state in states | sort(attribute='entity_id') %}
  {# get all devices for label #}
    {# check if current state belongs to labelled device #}
    {%- set d = device_id(state.entity_id) %}
      {# relevant / labelled device found, determine its proper name #}
      {%- set device_name = device_attr(d, "name_by_user") %}
      {%- if device_name is none %}
        {%- set device_name = device_attr(d, "name") %}
      {%- endif %}
 
      {%- if device_name not in ns.d1 %} 
        {# first time device encountered #}
        {%- set d2 = { device_name : state.last_reported } %}
        {%- set ns.d1 = dict(ns.d1, **d2) %}
      {%- elif ns.d1[device_name] < state.last_reported %}
        {# more recent last_reported found #}
        {%- set d2 = { device_name : state.last_reported } %}
        {%- set ns.d1 = dict(ns.d1, **d2) %}
      {%- endif %}
{%- endfor %}
 
{# now we have a dictionary for every labelled device with last_reported timestamp #}
{{ ns.d1 }}

Got thie error in Developer tools:

TemplateError: Must provide a device or entity ID

Core
2024.4.3
Supervisor
2024.04.0
Operating System
12.2
Frontend
20240404.2

Maybe HA states allow entity states without being linked to any device.

I have added another if not none check and this template puts out an ugly list of all most recent last_reported states per device (at least for me).

{# Collect all last_reported from labelled devices #}
{# hack to all growing dictionary #}
{%- set ns = namespace(d1={} ) %}
 
{# {{ relative_time(now()) }} #}
  
{# get all states #}
{%- for state in states | sort(attribute='entity_id') %}
  {# get all devices for label #}
    {# check if current state belongs to labelled device #}
    {%- set d = device_id(state.entity_id) %}
    {%- if d is not none %}
      {# relevant / labelled device found, determine its proper name #}
      {%- set device_name = device_attr(d, "name_by_user") %}
      {%- if device_name is none %}
        {%- set device_name = device_attr(d, "name") %}
      {%- endif %}
 
      {%- if device_name not in ns.d1 %} 
        {# first time device encountered #}
        {%- set d2 = { device_name : state.last_reported } %}
        {%- set ns.d1 = dict(ns.d1, **d2) %}
      {%- elif ns.d1[device_name] < state.last_reported %}
        {# more recent last_reported found #}
        {%- set d2 = { device_name : state.last_reported } %}
        {%- set ns.d1 = dict(ns.d1, **d2) %}
      {%- endif %}
    {%- endif %}
{%- endfor %}
 
{# now we have a dictionary for every labelled device with last_reported timestamp #}
{{ ns.d1 }}

This generated the list with several devices.

And now?
What is the code to generate the card?

Do you see any of your previously mentioned unavailable devices in that list? If so, which timestamp do they list?

And just for clarification. Earlier you mentioned ~70 unavailable entities, but this template checks for the ‘overarching’ devices. So you would need to identify the actual devices to which these unavailable entities belong to and should search for these device names in the list.

And then of course these devices also need to be labelled with exact label name.

These are the devices:

'1mmw': datetime.datetime(2024, 4, 25, 11, 30, 33, 978550, tzinfo=datetime.timezone.utc), 

and

'Energia': datetime.datetime(2024, 4, 25, 11, 30, 20, 549987, tzinfo=datetime.timezone.utc), 

Both seem to have pushed some data today before noon (2024, 4, 25, 11, 30, 20). According to this they have not been offline for more than 24 hours; hence they wouldn’t show up in the dashboard card.

Is it possible that these devices actively report on some entities while other of their entities aren’t available anymore?

For testing purpose you could also try to reduce the cutoff timestamp of my original template from 24 hours down to maybe just 30 mins. In this case these devices you have listed (again assuming they have also been properly labeled) should then appear in the markdown card.

I lost this information.
I restarted HA in the morning, so I change to 6h

This my code now:

{# Collect all last_reported from labelled devices #}
{# hack to all growing dictionary #}
{%- set ns = namespace(d1={} ) %}
 
{# {{ relative_time(now()) }} #}
  
{# get all states #}
{%- for state in states | sort(attribute='entity_id') %}
  {# get all devices for label #}
    {# check if current state belongs to labelled device #}
    {%- set d = device_id(state.entity_id) %}
    {%- if d is not none %}
      {# relevant / labelled device found, determine its proper name #}
      {%- set device_name = device_attr(d, "name_by_user") %}
      {%- if device_name is none %}
        {%- set device_name = device_attr(d, "name") %}
      {%- endif %}
 
      {%- if device_name not in ns.d1 %} 
        {# first time device encountered #}
        {%- set d2 = { device_name : state.last_reported } %}
        {%- set ns.d1 = dict(ns.d1, **d2) %}
      {%- elif ns.d1[device_name] < state.last_reported %}
        {# more recent last_reported found #}
        {%- set d2 = { device_name : state.last_reported } %}
        {%- set ns.d1 = dict(ns.d1, **d2) %}
      {%- endif %}
    {%- endif %}
{%- endfor %}
 
{# now we have a dictionary for every labelled device with last_reported timestamp #}
{# {{ ns.d1 }} #}

{# define cut off timestamp to be considered 'too old' #}
{%- set timestampCutoff = now() - timedelta( hours = 2, minutes = 00 ) %}

<table border="1">
<tr><th>Device</th><th>Last known contact</th></tr>
{# loop over dictionary and only list entities that reported before timestampCutoff #}
{%- for key, value in ns.d1.items() %}
  {%- if as_timestamp(value|e) < as_timestamp(timestampCutoff) %}
<tr><td>{{ key|e }}</td><td>{{ ((now() - as_datetime(value|e))) }} ago</td></tr>
  {%- endif %}
{%- endfor %}
</table>

How do I leave just the table?
I edited the code and commented the line {{ ns.d1 }}

And I think these shouldn’t appear on the list:

Edit 2:

After removing some test devices:
image
It would be interesting to have an exclusion list

Cool. You are making some progress.

My initial template contained the line {%- for d in label_devices('Availability check: daily') %}

This checks if a device has that specific label; and only then any entities from that devices are analyzed. So you can see this as an inclusion list (kinda the opposite of your exclusion list, but should lead to the same results).

Did you label all the 6 devices from your list with that specific custom label (‘Availability check: daily’) in HA?

Some thoughts on this:

  • I agree with: Devices DO go offline silently. In reality it happens. :+1:
  • Why monitoring only a couple of devices and not all? Are there devices in your smart home where you do not care if they are online or not? :wink:
  • Why looking to/checking devices just to detect, that they are fine? If the device is okay, then a check is a waste of time. I guess you are interested in offline devices and not in the ones which are online.
  • The only entities which give you a real knowledge about the device status are sensors and binary sensors, since this are the entities which are updated by the device itself. Think about it. :wink:

This is my solution (just for info):
https://community.home-assistant.io/t/detecting-unresponsive-devices/658030/16

Good points. And am glad I’m not the only one with devices sneaking out of my system :wink:

Haven’t seen your solution. Looks more advanced and it’s touching unknown areas (to me) - more to learn for me - thx for the link! Looks like you are using last_changed and last_updated and my very first interpretation of their meanings was that these might not been always been updated and could lead to false positives (e.g. when an entity reports the same value like the previous one). Maybe that is a false conclusion?

Valid point about monitoring all devices. I wanted to categorize my devices as I don’t expect all of these being active in the same intervals. E.g. my basement motion sensor might be lonely for a few days in a row, whereas my main door motion sensor shouldn’t. Another case is that some things are season-dependent, like my heating thermostats. During summer they are off / without batteries and with the tagging I thought that I can quickly remove them from the list of devices-to-be-checked. I also play around a bit and test things out; just to realize that I won’t need them, often some Internet-based things (recently weather and solar/PV forecast integrations). So don’t want such things on my ‘offline’ list.

Not sure about the comment re: ‘why looking to/checking devices just to detect that they are fine’. Indeed if a device is fine, I don’t want to know. That’s why only devices being offline for too long (in my current case 24 hours) are listed.

And for the sensors/binary sensors I guess I don’t know enough to really talk about. Do things like physical buttons and lights also fall into that category? Cause I also care about them, despite them only responding to a physical interaction. E.g. it is rare that my outdoor lights triggered at night via motion sensors aren’t going on at least once per night. So if one bulb isn’t, then the likelihood for a problem is high and I want to know.

Have you considered building in a bit of redundancy? Two sensors with different protocols - one Zigbee and one wi-fi, or something?

The power monitoring smart plugs that are in my cupboard right now ? The Christmas lights ?

Good point… :wink:
I also have devices in my smart home which are not in use everytime. But I tend to deactivate those devices. But that is a decision of own.

Yes, I test against ‘last_changed’ and ‘last_updated’.
And your right, it can lead to false positives. In reality, I do not have.
My script checks every sensor or binary_sensor of a device. And most of the devices have several sensors. If only one sensor is updated, I assume the device is fine. With my devices, any of the sensors of a device always changes and that’s good enough for me.

The thing is: Other entities than sensor or binary_sensor can be touched/changed by HA itself.
For instance a wall plug: It has a switch entity. This switch entity can be either changed by HA (by an automation for example) or directly on the physical device by pressing a button. The entity will be updated in any case, but can you be sure if this update really comes from the device?
But if this wall plug has a sensor for power measurement and this sensor entity gets updated than you know for sure, that this update was done by the device. Which means, it’s alive.

I believe, sensors and binary_sensors (of real physical devices) never gets updated by HA itself, only on behalf of the corresponding device. That’s why I only check sensors and binary_sensors.

Ha. I didn’t even know that devices can be disabled. Makes a lot of sense. Thanks for pointing that out. But also: What a boringly simple solution … :sweat_smile: