Plans to improve handling of offline devices?

Hi,

I’m wondering if there are currently any plans to improve the handling of offline devices in HA.
I refering to devices that are generally supported by HA but not reachable via network at the time HA is (re)started.

I currently have a Media Player (Kodi), an AV-receiver (Denon) and a projector (Epson) that have HA integrations.
For power-saving reasons and the simple fact that in my environment it’s not necessary to have those accessible via network when they’re not in use, none of those are configured to listen to network requests in their off/standby state.

Unfortunately, even though HA can handle if those devices are switched off at some point, it requires the device to be “active” during a HA (re)start - otherwise the device will not show up in HA and the UI presents red error cards for the now missing devices.
There doesn’t seem to be a way to detect when a device comes online at a later point.
I think it should be possible through device discovery, but apparently this is hardly used.

Theoretically it shouldn’t be hard to have a device show up in an “offline” or “unavailable” state it it’s not reachable during HA startup and update its state as soon as it becomes available.
I’m sure there’s a lot of ways to do that cleanly, but I think that’s something the core developers need to agree on and it probably needs to be made a requirement for components to support it.

What are your thoughts?

Sebastian

2 Likes

I think the main problem is, that this is very individual per component. In your case it’s a printer, and when it’s offline, one entity is missing. Or maybe a few more depending on what data it provides. But for things that send their data through a hub (like with MQTT), a lot of devices would be offline. Or are they not? Maybe a device is mobile and just our of reach (regularily).
What I’m trying to get at: for specific types this certainly is doable, and some actually already do that and just keep doing the setup until it’s successful. But there’s no generic solution that applies to every situation. Essentially it’s in the hands of the developers of those individual components.

I’m not seeing the problem.
If the hub is unavailable during startup, entities of devices that connect through the hub (and that are somehow referenced in the config, so it is known that they have existed at some point) would appear unavailable, too.
And instead of throwing errors about an unknown entity that once used to be the hub, HA could simply create a “stub” entity for the hub that signals “yes that entity existed, but it’s currently not available”.
If the hub becomes available at a later point, HA should notice that and populate the (stub) hub’s state info with the real info.
The connected devices change their availability as soon as they “report in”.
Basically the reversed behavior of what happens when an available entity suddenly becomes unavailable.

For example, in case of my AV receiver, it simply changes its status to “unavailable”, but at least the entity still exists. When I turn on the AVR later, the state is correctly updated to “On” in HA.
If the AVR is not on when HA starts, I only get a “no state available” error card and even when I turn it on later, I have no way of reinitializing the state object, so HA never notices that the component would now be available.

I’m with you in that there’s no generic solution that applies to each and all possible situations.
But at least for devices/classes where it is feasible (like media players), in my eyes it would make sense to have some kind of policy in place that must support some kind of “late instantiation”.

As I said, I think this is totally doable by using the discovery component.
A while ago I wrote a media player platform module with a corresponding discovery module and even if the device was offline during HA startup, discovery found it later and created the corresponding entity.
So the mechanisms would be there - they’re just not used by all components.

Sebastian

Oh yes, that is absolutely necessary. With so many integrations some devices or services are bound to be I reachable at some point. The error messages in the UI are not helping and restarting HA repeatedly is a pain especially with zwave on board.

There should be at least a retry button so that the specific component tries to reinitialize. Automatic recovery would of course be favorable.

I also had the case where a local service (pilight) was integrated via websocket connection and when the service restarted the HA integration was broken until the next reboot because it never tried to reconnect the service.

Developer guidelines should be clear about the level of fault tolerance and recovery expected from each integration.

I would support this type of addition to HA.

A great example of an integration that has this trouble is the LIFX bulbs. If these are off at the switch then they do not discover and do not exist. It creates havoc with automations and the GUI.
Even if they are on sometimes they can take a while to discover.
I have moved away from LIFX bulbs to hue for that exact reason as hue will show as offline through the hub. I more than a hundred devices in my installation, so having some that disappear on restart is not worth my time.

I have written some check automations (I use node red) that make sure all the discoverable entries are available before starting HomeKit, retry periodically and notify me if not successful. Without these I was constantly fixing it.

Well, it wasn’t supposed to be an “addition” in the sense that this thread would be a feature request.
But your’s is another example where the current behavior of HA creates unsatisfactory (for lack of a better word) situations for its users.
With HA approaching 1.0, I think that this is something that needs to be addressed on its road map in the not-too-distant future.

Sebastian

I’d vote this up if it was a feature request.

The Logitech Media Server (LMS) platform throws a ton of errors if I ever take one of my players goes offline, and if the device it offline on boot I can’t use it. EVERYTHING needs to be powered on when HA boots. The LMS server is on the entire time however, and the webinterface for LMS gracefully handles players going online and offline, so it is possible.

I imagine the discovery platform would need tinkering with here, it would be nice to to have thousands of warnings in my log for offline devices.

So how do we get the core developers’ view on things? Do they only hang out on Discord or are they scanning the forum too?
I’m pretty sure that the issue must have popped up on someones radar eventually.
Maybe it’s not seen as an issue?
@balloob?

Sebastian

might not be the best timing to mention Google Homes, but +1 for that. see: Issues with Googlehome component BT tracking

We shouldn’t solve this at a Home Assistant level, because there will be exceptions to the rule, causing us to restore the wrong things, incorrect things or even non-existing entities.

The way Home Assistant approaches problems is that we offer building blocks, but let the integration have the final say. So in this case, all building blocks are there but requires integrations to put it together.

For example, OwnTracks restores its tracked entities. It uses a combination of the device registry and the “RestoreEntity” mixin: source code.

I agree that leaving the handling of offline devices completely to HA’s core would not be wise, for the reasons you mention.
I think we’d need some kind of “mandatory quality guidelines” for components in the long run, forcing their developers to address the “offline issue” by using the framework provided by HA core.
Otherwise we’d end up with a few “good” components that deal with those situations and a lot that don’t.

Sebastian

1 Like

+1 for that.

If a webservice is not available at start (MeteoAlarm is quite shaky), the entity never comes alive unless we restart HA.
But I also agree that it’s probably the way the components are build that is at stake, here, rather than HA core.

Perhaps a service could be created that would allow integrations to be reloaded.

That would allow automations to have integrations retry discovering devices from time to time.

Can you give some examples of automations that are affected by this?