Refining automatic config_entry updating via device discovery

dev0 · May 22, 2024, 7:18pm

I am looking for input, feedback and hopefully support on a set of changes I would like to submit a pull request for.

Prelude

Home Assistant has a great feature where it will automatically update config entries based on it’s device discovery mechanism. That way, even when IP addresses change, Home Assistant can transparently reconfigure config entries to keep those devices connected.

Device discovery uses a broad set of specific mechanisms to discover devices, which can in turn lead to many different causes for the config entry being updated. Some examples are DHCP requests, the ARP protocol, but also certain types of device_trackers from network router integrations can lead to an update of config entries connectivity information.

The way this works is that if any of those mechanisms “see” a device, they relate the hostname, ip address and mac address of that device to the dhcp component, which in turn will match the mac address and hostname with existing config entries and update them with the potentially changed ip address.

How this can break

There are two main ways how this can break, leading to non-working config entries for users. Both of them are not necessarily super common, but due to the somewhat automagical nature of this feature, neither are obvious when they happen and neither have any good ways to remedy or work around them currently.

Visibility by by router integrations should not imply connectivity for Home Assistant

This is the exact scenario I ran into that lead me to digging into this mechanism.

The connected device “bridge” between router based device trackers and the DHCP component assumes that if a router integration can “see” a device on one of it’s networks, that this means that Home Assistant will be able to connect to that device on that network. This does not necessarily hold true though.

To use my specific case as an example, I have an ESPHome device on a separate network from my Home Assistant and use NAT port forwarding on a host that is connected to both networks to allow Home Assistant to connect to the ESPHome device. When my router sees the ESPHome device on the isolated network, it updates the config entry of the ESPHome integration with the IP address of the ESPHome device on the isolated network, resulting in Home Assistant not being able to connect to it anymore.

(Semi-/mobile) devices with globally routed IP addresses / FQDNs

This scenario I haven’t actually run into myself, but is the scenario I was trying to simulate with my ESPHome node isolated on another (local) network.

If a device configuration is added using a globally routed IP address or a FQDN resolving to a globally routed IP address, then updating the config entry via the DHCP component when it is discovered locally, will lead to the device only being accessibly locally and no longer globally / remotely. This is acceptable for devices that are stationary. Not however for mobile or semi-mobile devices that I expect to be roaming between locations. Or devices that I setup locally but intend on deploying remotely. Such devices will no longer be accessible when they leave the local network.

Proposed solutions

I recognize that both issues are probably somewhat arcane and may not be super common, but when people run into them they are hard to understand, debug and ultimately there don’t seem to exist any good ways to work around them. At least not without changing the premise (e.g. the users network setup).

I see three possible improvements to the device discovery flow, which will help with these particular issues, but also have the potential to help users work around other issues related to device discovery they might be experiencing.

Option 1: User controllable configuration to opt a config entry out of device discovery based updates.

A user controllable setting that would allow the user to opt a specific config entry out of the device discovery based updating of the IP address. Similar options already exist in the “System options” cog menu, which I think might be a good spot to put such an option.

E.g.

(==O) Enable device discovery based network address updates
If Home Assistant should automatically update the network address based on device discovery

Such a configuration option would give users the ability to disable automatic updates for any config entries which they might experience issues with otherwise. This is a powerful feature which gives a lot of control to the user. This is great, because it allows users to help themselves, but it also requires users to understand their problem in the first place and then for them to manually change this setting. It’s probably still a good and simple “catch all” solution.

Option 2: Exclude globally routed IP addresses and FQDNs from automatic updates

This change would have no effect on the first type of issue outlined above, but would completely solve for the second type of issue without requiring any user awareness of the problem and being incredibly unlikely to have unwanted side-effects (i.e. breaking stuff that worked before).

The change would make it so that non-RFC1819 IP addresses are (i.e. “non-local” IP addresses) and non-link-local and non-local FQDNs (fully qualified domain names, i.e. “myremotedevice.dyndnsprovider.net”) are excluded from being automatically updated based on device discovery.

It might be desirable to include link-local and local FQDNs (i.e. “myhost.local” or “myhost.localhost”) into the carve-out, but I can see good arguments for both excluding and including them. To “play it safe”, this proposal would continue to update link-local and local FQDN to reduce false negatives and keep the behavior as closely aligned with the original behavior while also resolving the outlined issues.

By excluding non-RFC1819 IP addresses and non-link-local and non-local FQDNs, devices that were configured with “global” connectivity will not get automatically “downgraded” to local connectivity just because they may have been (temporarily) connected to a local network.

Option 3: Alternative connectivity profiles

This seems to be the most invasive and complicated change from an implementation perspective, but would also transparently resolve both mentioned issues. However, it might not be possible to exclusively implement on the HA platform level and may require integration authors to opt into (I frankly haven’t explored this option in too much detail since it’s my least favorite due it’s complexity and potential for breakage).

Instead of plainly replacing the connectivity information in the config entry the concept of alternative “connectivity profiles” could be introduced. Basically, instead of overriding the “host” field, the config entry would be amended with an “alternative_hosts” list. The originally configured host would never get touched by the device discovery, but integration authors (or where possible the HA platform itself) would upon a failed connection, round robin over all hosts, including the alternative_hosts list.

This way, the originally by the user configured configuration will never get lost and can always be recovered while also utilizing newly available connectivity profiles added by device discovery. As an additional optimization the mechanism could always start the first connection attempt with the most recently successfully used profile and could purge automatically generated profiles after not having been successfully connected to for a while.

This approach, while providing a more wholistic and user-transparent solution, is also significantly more complicated to implement, may require significant changes across the codebase, possibly including integrations, and ultimately has a much higher potential for bugs.

For those reasons I think this approach is probably not advisable.

Conclusion

I hope I can get some input, thoughts, advice and support here. I would like to work on and provide a pull request for these solutions if there are no objections and general support for changes like this. I think it is probably worth implementing at least the first solution, possibly amended by the second solution.

I welcome any feedback and thoughts!

Related bug report: device_tracker announcing devices via CONNECTED_DEVICE_REGISTERED even when disabled · Issue #117888 · home-assistant/core · GitHub
I’ve also posted related to this topic to the Home Assistant and ESPHome Discord servers (no links for obvious reasons).

bdraco · May 25, 2024, 4:41am

Alternately many integrations check that the device is offline and only update the IP address in the config entry if it is currently unreachable. While this is more complex it means the user does not have to configure anything.

example: core/homeassistant/components/powerwall/config_flow.py at 81f3387d06da742eba735bb26a88a3ddb2850f2c · home-assistant/core · GitHub

example: core/homeassistant/components/unifiprotect/config_flow.py at 81f3387d06da742eba735bb26a88a3ddb2850f2c · home-assistant/core · GitHub

edit: it looks you received similar feedback on discord from another member

ESPHome doesn’t currently do that, but it probably should.

bdraco · May 25, 2024, 5:03am

I think ignoring globally routable addresses in dhcp would be OK as well. FQDNs would be a problem because some routers use a local domain with local dns…

something like

diff --git a/homeassistant/components/dhcp/__init__.py b/homeassistant/components/dhcp/__init__.py
index b4d06b6e276..b18ed65496a 100644
--- a/homeassistant/components/dhcp/__init__.py
+++ b/homeassistant/components/dhcp/__init__.py
@@ -206,8 +206,9 @@ class WatcherBase:
             made_ip_address.is_link_local
             or made_ip_address.is_loopback
             or made_ip_address.is_unspecified
+            or made_ip_address.is_global
         ):
-            # Ignore self assigned addresses, loopback, invalid
+            # Ignore self assigned addresses, loopback, invalid, and global addresses
             return
 
         formatted_mac = format_mac(unformatted_mac_address)

is_global isn’t cached though so you’d need to do a PR to add it here cached-ipaddress/src/cached_ipaddress/ipaddress.py at 7ea4ecbeb239e589c63cf5a29ae583f704b1448b · bdraco/cached-ipaddress · GitHub and cached-ipaddress/src/cached_ipaddress/ipaddress.py at 7ea4ecbeb239e589c63cf5a29ae583f704b1448b · bdraco/cached-ipaddress · GitHub as well

dev0 · May 25, 2024, 8:10pm

@bdraco Thanks for the response!

ESPHome doesn’t currently do that, but it probably should.

I didn’t realize that integration itself is responsible for updating the config entry. I assumed that this code here is updating the config entry directly. Which after looking at it more closely is obviously incorrect.

Given that the integration is updating the config entry itself, I agree that the integration should ensure that there is an actual need to do so.

On the point of globally routable IP addresses I think we may be talking past each other. I was arguing that the config entry shouldn’t be updated with a local IP address if a globally routable IP address was already configured. To avoid “locking in” a device with a local IP address. I think your description and code pointer assumes the opposite (i.e. DHCP somehow reporting a globally routable IP addresses the new device IP address).

However, if the integration correctly checked the current connection before updating the config entry, this wouldn’t be necessary, since the integration would only update the config entry if the globally routable IP address wasn’t providing a working connection anymore.

Given my confusion around the update flow, I think none of my proposals are the right path forward though. Instead the integration should verify if there is a need to update, and if there isn’t simply ignore new IP addresses reported by DHCP.

Thanks a lot for you feedback. This was very helpful!

bdraco · May 25, 2024, 8:52pm

Sounds good. If you want to do a PR to add the checks to ESPHome, I’m happy to prioritize reviewing that.

dev0 · May 25, 2024, 9:04pm

Thank you, I will take a look at that.