Support generic offline detection for Entity Components

As far as I understand this post and its answer, there is no generic way of offline detection. IMHO this is something which should be supported by most of the Entity Components, to allow for easy monitoring and alerting.

I’m still too new to this area, but my proposal would be to add some common way to support this independent of the Entity type, so it an be implemented by either the Entity itself or the underlying platform where applicable. This might be as simple as a convention for compoment developers.

Seems like more people might be interested in that topic, see e.g. this unanswered post.

  • Most devices don’t report if they’re offline.
  • Some don’t report an acknowledgement upon receipt of a command.
  • For those that don’t acknowledge, some can’t be polled for their status.

Given this broad range of (in)abilities, it’s a challenge to devise a ‘generic way of offline detection’.

The simplest strategy is “If I don’t hear from you for over X minutes (or hours), I will assume the worst and consider you to be offline.”

In MQTT Sensor, the expire_after option implements this concept.

(integer)(Optional) Defines the number of seconds after the value expires if it’s not updated.
Default value: 0

So if you set expire_after to 300 seconds (5 minutes) and no message is received during this period, the sensor is considered to be offline and, instead of a numeric value, the UI shows a hyphen (-). This is suitable for sensors that periodically report their status (temperature, humidity, etc).

This concept would not be applicable to light switches or any other device that does not periodically report its status. You’d have to use another technique such as using the device’s acknowledgements (to issued commands) as an indicator of its health. So if you command the light to turn on and it replies with an acknowledgement, you consider it to be online. If it fails to reply, you consider it to be offline. Many lighting protocols work this way but not all so this technique cannot be considered to be generic.

Another technique would be polling. Periodically request status from all devices and those who fail to reply are deemed to be offline. Yet again, not all devices support polling so this technique is not a panacea either. In addition, polling comes with its own headaches. For devices using wireless mesh networks (zwave, zigbee) if not used judiciously, it can add to network congestion and impact its performance (i.e. responsiveness).

The MQTT protocol has the concept of Last Will and Testament (LWT). In the event the device goes offline, its LWT is shared with other devices (… pretty much like how it works with deceased people).

The question is “How do we know the device is offline?” In MQTT, devices formally connect and communicate with an MQTT Broker (a ‘middleman’) who manages and monitors the connections. If the Broker detects the device is no longer communicating, it publishes the device’s LWT. Any other device subscribed to that LWT (like Home Assistant can via the availability_topic) is now informed of the device’s demise. However, this is all specific to MQTT and isn’t applicable to other protocols.

1 Like

There are different approaches, all specific to the target technology. Some devices might e.g. be detected as offline if they don’t respond (whatever that is) to a command in time. A Platform might indicate all devices being offline if the bridge is offline.

But that’s not what my proposal is about at all. Seems my original post was misleading. Let me try to rephrase:

  • It seems there is a need / request for some users to have the ability to work with the on/offline state of devices. Having to “re-invent the wheel” for different entities and platforms is redundancy and redundancy is bad and is prone to errors. We for sure all agree about that one.
  • Not all entity / platform combinations can and should provide this feature.
  • Errored and not responding to commands” could be considered offline as well.
  • For most (actual) devices implementing this should be possible. I might be wrong with this one :wink:

So all I ask for is to provide the possibility for developers to provide such a state, so that all supported devices could be handled altogether.

IMHO if there is a common need (offline detection) for plugins (components) targetting a single framework (HA), it should be supported by the framework to simplify it for the developers and prevent rank growth of custom solutions.

If I understand you correctly, you’re asking for commonality. At the moment, if a platform supports some form of offline detection, the way it is exposed to the user depends upon the platform. For example, in MQTT, it is handled via the availability_topic.

OK, so what do you propose it look like? For example, would it be an attribute called availability with a value of either online or offline? Platforms that, in some manner, support offline detection would use this attribute and all others would not. So if you view an entity in the States page and see it has an availability attribute, you can use that (for example) in an automation. So something along those lines?

exactly :+1: Should be a common convention that if a platform does support this, it should be done exactly that way. And one should encourage all existing ones to support it if possible and useful.

Well, I have started with HA this weekend, so I don’t presume to define this without getting a better understanding of the code. But for now, it seems like adding a constant like ATTR_AVAILABILITY to the const.py to define an additional attribute and defining strings or even better integer flags (for easier extensibility and expressiveness later on) as values like

  • AVAILABILITY_UNKNOWN: no information about the compoment’s availability. Should be the default set by unsupported platforms.
  • AVAILABILITY_ONLINE: component is working
  • AVAILABILITY_ERRORED: compoment is available but can’t be used due to some internal error state, e.g. an Amazon Alexa without an active internet connection.
  • AVAILABILITY_OFFLINE: device is offline
  • AVAILABILITY_OFFLINE_BRIDGE: the whole bridge the device is connected to is offline (i.e. the TRÅDFRI Gateway or Hue Bridge)

would do trick. Maybe add information to the Coding Guidelines and base Entities as well.

@123 does that sound reasonable to you?

Yes, it sounds like a reasonable, and useful, proposal. You have my vote.

1 Like

This would be great !!!