I am looking for input, feedback and hopefully support on a set of changes I would like to submit a pull request for.
Prelude
Home Assistant has a great feature where it will automatically update config entries based on it’s device discovery mechanism. That way, even when IP addresses change, Home Assistant can transparently reconfigure config entries to keep those devices connected.
Device discovery uses a broad set of specific mechanisms to discover devices, which can in turn lead to many different causes for the config entry being updated. Some examples are DHCP requests, the ARP protocol, but also certain types of device_trackers from network router integrations can lead to an update of config entries connectivity information.
The way this works is that if any of those mechanisms “see” a device, they relate the hostname, ip address and mac address of that device to the dhcp component, which in turn will match the mac address and hostname with existing config entries and update them with the potentially changed ip address.
How this can break
There are two main ways how this can break, leading to non-working config entries for users. Both of them are not necessarily super common, but due to the somewhat automagical nature of this feature, neither are obvious when they happen and neither have any good ways to remedy or work around them currently.
Visibility by by router integrations should not imply connectivity for Home Assistant
This is the exact scenario I ran into that lead me to digging into this mechanism.
The connected device “bridge” between router based device trackers and the DHCP component assumes that if a router integration can “see” a device on one of it’s networks, that this means that Home Assistant will be able to connect to that device on that network. This does not necessarily hold true though.
To use my specific case as an example, I have an ESPHome device on a separate network from my Home Assistant and use NAT port forwarding on a host that is connected to both networks to allow Home Assistant to connect to the ESPHome device. When my router sees the ESPHome device on the isolated network, it updates the config entry of the ESPHome integration with the IP address of the ESPHome device on the isolated network, resulting in Home Assistant not being able to connect to it anymore.
(Semi-/mobile) devices with globally routed IP addresses / FQDNs
This scenario I haven’t actually run into myself, but is the scenario I was trying to simulate with my ESPHome node isolated on another (local) network.
If a device configuration is added using a globally routed IP address or a FQDN resolving to a globally routed IP address, then updating the config entry via the DHCP component when it is discovered locally, will lead to the device only being accessibly locally and no longer globally / remotely. This is acceptable for devices that are stationary. Not however for mobile or semi-mobile devices that I expect to be roaming between locations. Or devices that I setup locally but intend on deploying remotely. Such devices will no longer be accessible when they leave the local network.
Proposed solutions
I recognize that both issues are probably somewhat arcane and may not be super common, but when people run into them they are hard to understand, debug and ultimately there don’t seem to exist any good ways to work around them. At least not without changing the premise (e.g. the users network setup).
I see three possible improvements to the device discovery flow, which will help with these particular issues, but also have the potential to help users work around other issues related to device discovery they might be experiencing.
Option 1: User controllable configuration to opt a config entry out of device discovery based updates.
A user controllable setting that would allow the user to opt a specific config entry out of the device discovery based updating of the IP address. Similar options already exist in the “System options” cog menu, which I think might be a good spot to put such an option.
E.g.
(==O) Enable device discovery based network address updates
If Home Assistant should automatically update the network address based on device discovery
Such a configuration option would give users the ability to disable automatic updates for any config entries which they might experience issues with otherwise. This is a powerful feature which gives a lot of control to the user. This is great, because it allows users to help themselves, but it also requires users to understand their problem in the first place and then for them to manually change this setting. It’s probably still a good and simple “catch all” solution.
Option 2: Exclude globally routed IP addresses and FQDNs from automatic updates
This change would have no effect on the first type of issue outlined above, but would completely solve for the second type of issue without requiring any user awareness of the problem and being incredibly unlikely to have unwanted side-effects (i.e. breaking stuff that worked before).
The change would make it so that non-RFC1819 IP addresses are (i.e. “non-local” IP addresses) and non-link-local and non-local FQDNs (fully qualified domain names, i.e. “myremotedevice.dyndnsprovider.net”) are excluded from being automatically updated based on device discovery.
It might be desirable to include link-local and local FQDNs (i.e. “myhost.local” or “myhost.localhost”) into the carve-out, but I can see good arguments for both excluding and including them. To “play it safe”, this proposal would continue to update link-local and local FQDN to reduce false negatives and keep the behavior as closely aligned with the original behavior while also resolving the outlined issues.
By excluding non-RFC1819 IP addresses and non-link-local and non-local FQDNs, devices that were configured with “global” connectivity will not get automatically “downgraded” to local connectivity just because they may have been (temporarily) connected to a local network.
Option 3: Alternative connectivity profiles
This seems to be the most invasive and complicated change from an implementation perspective, but would also transparently resolve both mentioned issues. However, it might not be possible to exclusively implement on the HA platform level and may require integration authors to opt into (I frankly haven’t explored this option in too much detail since it’s my least favorite due it’s complexity and potential for breakage).
Instead of plainly replacing the connectivity information in the config entry the concept of alternative “connectivity profiles” could be introduced. Basically, instead of overriding the “host” field, the config entry would be amended with an “alternative_hosts” list. The originally configured host would never get touched by the device discovery, but integration authors (or where possible the HA platform itself) would upon a failed connection, round robin over all hosts, including the alternative_hosts list.
This way, the originally by the user configured configuration will never get lost and can always be recovered while also utilizing newly available connectivity profiles added by device discovery. As an additional optimization the mechanism could always start the first connection attempt with the most recently successfully used profile and could purge automatically generated profiles after not having been successfully connected to for a while.
This approach, while providing a more wholistic and user-transparent solution, is also significantly more complicated to implement, may require significant changes across the codebase, possibly including integrations, and ultimately has a much higher potential for bugs.
For those reasons I think this approach is probably not advisable.
Conclusion
I hope I can get some input, thoughts, advice and support here. I would like to work on and provide a pull request for these solutions if there are no objections and general support for changes like this. I think it is probably worth implementing at least the first solution, possibly amended by the second solution.
I welcome any feedback and thoughts!
Related bug report: device_tracker announcing devices via CONNECTED_DEVICE_REGISTERED even when disabled · Issue #117888 · home-assistant/core · GitHub
I’ve also posted related to this topic to the Home Assistant and ESPHome Discord servers (no links for obvious reasons).