Integrations don’t always provide a kind of last_seen, which is not the same as last_updated
Because HA do not manage the firmware in the devices, so it can not force the device to send an update.
Many devices do not update if the value have not changed. Updates means the device have to start up the communication hardware and that is expensive on battery powered devices, especially if the device is using WiFi.
Other than that it is up to the integration owner to implement a last_seen attribute if it is possible.
I’m not talking about long not seen devices because of batteries, but at least if updated values were traced we could set good thresholds.
And even devices on batteries usually report once a day.
What I’m asking from what I’ve seen:
- If an entity of a device is updated but the same, there can be no trace depending on the integration. This should be corrected.
- If any entity of a device is updated, no value linking it to the device.
For example, a temperature can vary, but not humidity. But as long as one entity linked to the device is live, we can assume the device is alive.
What device do you have issue with?
HA already options to make attributes in sensors, so it is up to the integrations to populated then.
ZigBee does (granted time out can be a bit long for sleepy devices). So does ESPHome.
For Yolnk and Shelly sensors/relays, I use these template sensors in configuration.yaml to verify if aything is unavailable or unknown -
template:
#
# This binary sensor not only specifies if any entities in the integration are unavailable, but also if
# there are any unavailable, it will contain an attribute that is a list of the entities that are unavailable.
#
- binary_sensor:
- name: "Any Yolink Unavailable"
state: "{{ integration_entities('yolink') | select('is_state', 'unavailable') | list | count > 0 }}"
attributes:
entity_id: "{{ integration_entities('yolink') | select('is_state', 'unavailable') | list }}"
unique_id: any_yolink_unavailable
#
# This binary sensor not only specifies if any entities in the integration are known, but also if
# there are any unknown, it will contain an attribute that is a list of the entities that are unknown.
#
- binary_sensor:
- name: "Any Yolink Unknown"
state: "{{ integration_entities('yolink') | select('is_state', 'unknown') | list | count > 0 }}"
attributes:
entity_id: "{{ integration_entities('yolink') | select('is_state', 'unknown') | list }}"
unique_id: any_yolink_unknown
#
# This binary sensor not only specifies if any entities in the integration are unavailable, but also if
# there are any unavailable, it will contain an attribute that is a list of the entities that are unavailable.
#
- binary_sensor:
- name: "Any Shelly Unavailable"
state: "{{ integration_entities('shelly') | select('is_state', 'unavailable') | list | count > 0 }}"
attributes:
entity_id: "{{ integration_entities('shelly') | select('is_state', 'unavailable') | list }}"
unique_id: any_shelly_unavailable
#
# This binary sensor not only specifies if any entities in the integration are known, but also if
# there are any unknown, it will contain an attribute that is a list of the entities that are unknown.
#
- binary_sensor:
- name: "Any Shelly Unknown"
state: "{{ integration_entities('shelly') | select('is_state', 'unknown') | list | count > 0 }}"
attributes:
entity_id: "{{ integration_entities('shelly') | select('is_state', 'unknown') | list }}"
unique_id: any_shelly_unknown
For example, I have no issue here as none of them are unavailable -
and although some shelly devices have some unknown properties:
ONly the last reboot time is unknown - so I know all my shelly and yolink sensors are all up and running and accounted for -
Some automations could possibly be driven off of such templates to either automaticaly reload entitie/s - or to send an alert notification/s etc…
Also a template sensor could be written that alo returns all battery levels for anything battery powered (this is on my phone) - so instead of relying upon a possible alert (which I might miss) - I can just verify all of them at a glance (I won’t bother showing all of them):
(The above is from the UI-minimalist dashboard on my phone) Hope that helps give some ideas -
Another thing that very much impressed me about shellly that I did not realize - I had forgot to include monitoring one new sensor from shelly that I added - and I got an email from shelly telling me the battery was critically low (very nice feature I didn not even need to set up)
I have this auto-entities
-based card in my Diagnostics dashboard, which permits me to identify entities that are dead at a glance:
type: custom:auto-entities
card:
type: entities
title: Unavailable entities
filter:
template: >
{% set aircon_stuff =
"(ir_transceiver_1|living_space|air_conditioner_|novamatic)" -%}
{{
states
| selectattr("state", "in", "unavailable unknown")
| rejectattr("entity_id", "match", "(button|person|conversation).*")
| rejectattr("entity_id", "match", "switch.cloud.*")
| rejectattr("entity_id", "match", ".*battery.*replaced")
| rejectattr("entity_id", "match", ".*(average|max|min).*power.*")
| rejectattr("entity_id", "match", ".*firmware")
| rejectattr("entity_id", "match", ".*portable.*")
| rejectattr("entity_id", "match", ".*power_factor.*")
| rejectattr("entity_id", "match", ".*(temperature_probe|cooking_temperature).*")
| rejectattr("entity_id", "match", ".*usb_charge_controller.*")
| rejectattr("entity_id", "match", ".*(next_alarm|remaining_charge_time)")
| rejectattr("entity_id", "match", ".*" + aircon_stuff + ".*")
| rejectattr("entity_id", "match", ".*prusa_mk4_(preview|progress|print_start|print_finish|filename)")
| rejectattr("entity_id", "match", ".*(beacon|_tablet).*")
| rejectattr("entity_id", "match", ".*(vacation_destination).*")
| rejectattr("enabled", "==", False)
| map(attribute="entity_id")
| list
}}
show_empty: false
I also have Alertmanager send me Matrix messages to alert me exactly when an entity goes AWOL. I may share the Prometheus rules I use to get these alerts at some point. The point is, when I get an alert, I can head to the Diags dashboard and troubleshoot.
From my experience for example, rfxcom rfxtrx can’t detect dead devices, because non changing entities are not logged by home assistant.
I think all entities should update a local last_seen and at the same time the last_seen of the linked device.
an entity can be stale, but it doesn’t mean the linked device is dead.
Devices that don’t maintain an active connection through TCP or Bluetooth to home assistant cannot be detected as having disappeared, unless they periodically contact home assistant. For connectionless protocols like Zigbee, home assistant relies on devices periodically talking to it. If such devices don’t contact home assistant after a certain period of time home assistant will consider them unavailable. It is the only reasonably-implementable solution.
For ZHA things are extremely messy IMHO. I understand the problems, but the Ninja templates you need to get close to something useful are a PITA.
If you enable availability in z2m, it has 2 timeouts:
- 10 minutes for router/plugged devices
- 720 minutes for battery powered devices
Never had a problem with those settings.
But other protocols usually don’t implement similar settings.
And worse, if the values don’t change, it’s not updated by HA.
There are some concepts in this package that might make that template a little more robust. Very similar concept.
Welcome to my NodeRed Outage flow:
Each checks an entity from an integration (not great, as many integrations have multiple entities, some of which could go offline while others don’t)
And each has a different threshold based on how often they check in, or for some integrations that regularly go unavailable for 10-60 minutes at a time, a longer threshold so I only get a notification if it’s offline longer than usual.
But not exactly a great way to do this. And I need to remember to add new integrations to this.
(It basically checks is State is “unavailable” for X minutes, and then I have it linked to notify through Telegram)
And it doesn’t work for all. Like Alexa Media player will sometimes just stop working (ie, list the latest playing media as something that played hours ago and no longer update.) So that one is hard to catch since it doesn’t go unavailable.
I solved this problem a while ago by asking home assistant to export telemetry on entities to Prometheus. And then in Prometheus, when an entity is not available, I send an alert using my already existing alerting messaging service.
What was tricky about this is that whenever a device goes offline, anywhere from one to maybe up to 20 entities will go unavailable. And that usually means a ton of alerts. So, using the Prometheus query language I wrote a formula that would group all entities into the respective devices. With that, I can just look at the sum of entities down per device to give me an alert for that device.
To be fair, a solution that relies exclusively on home assistant, or maybe a combination of node red and home assistant, is more practical because you don’t need to run Prometheus plus alertmanager plus a messaging service. But in my experience, my messaging service is 100% reliable, whereas home assistant notifications sometimes don’t work because my phone HA app doesn’t have support for push and isn’t connected to HA at that instant.