HomeKit Accessory Protocol (HAP) over CoAP/UDP (was: Nanoleaf Essentials bulb via Thread/CoAP)

Hi Andy

honestly there’s just not enough information here to answer your question with 100% certainty. The fd7e:37b8:e809:465c addresses could be from your border router (HomePod/ATV/etc), or they could be from your network. If they are from your border router, then you probably already have working accept_ra. But I just can’t tell.

Assuming you are right that it is accept_ra thats the problem, from what you have shared net.ipv6.conf.eth0.accept_ra is the one that needs to be set. Set it, and check in half an hour. If nothing has changed, show me the output of ip -6 route. And run the tcpdump command I posted in HomeKit Accessory Protocol (HAP) over CoAP/UDP (was: Nanoleaf Essentials bulb via Thread/CoAP) - #257 by Jc2k.

Thanks for the reply and apologies for slow response, bad day in the office with my primary DC core switches failing… Very tired so will pick this up tomorrow.

I am running an ATV4K as my border router but I can’t quickly see what IPv6 address is assigned to that, will investigate further.

I did see another thread on the UnRaid forum for IPv6 and docker containers so was planning to read that first in case there is more setup I need to do on the main server first other than just enabling IPv6, there are some bonding settings for example I need to look at. If bothering comes from that I’ll try the Ra command you list above.

Looks like I can join the ranks of people with intermittent issues. Nanoleaf bulb joined via Apple Home. Two Border Routers (2022 ATV 4k + HomePod Mini). The bulb in question shows up in zeroconf and I can ping its advertised IP via both border routers (using temporary static routes specifying each border router in turn as the next hop). The second line of the logs below even show the correct, advertised IPv6 address for the bulb–the fd6a address. So where the heck is the fdc5 address coming from that HASS is trying and failing to reach the bulb at?

2022-11-29 10:26:34.726 DEBUG (MainThread) [aiohomekit.controller.abstract] [[fdc5:366d:c048:0:abeb:84d3:ee31:320d]:5683] (id=17:A1:DE:6E:DA:9F): Accessories cache loaded (c#: 1) (has broadcast_key: False)
2022-11-29 10:26:34.726 DEBUG (MainThread) [aiohomekit.controller.abstract] [[fdc5:366d:c048:0:abeb:84d3:ee31:320d]:5683] (id=17:A1:DE:6E:DA:9F): Description updated: old=None new=HomeKitService(name='Nanoleaf A19 0T1C', id='17:a1:de:6e:da:9f', model='NL45', feature_flags=<FeatureFlags.SUPPORTS_SOFTWARE_AUTHENTICATION: 2>, status_flags=<StatusFlags.0: 0>, config_num=1, state_num=8, category=<Categories.LIGHTBULB: 5>, protocol_version='1.2', type='_hap._udp.local.', address='fd6a:ab22:4825:0:2e8:a408:94b7:7fd', addresses=['fd6a:ab22:4825:0:2e8:a408:94b7:7fd'], port=5683)
2022-11-29 10:26:34.727 DEBUG (MainThread) [aiohomekit.controller.abstract] [[fdc5:366d:c048:0:abeb:84d3:ee31:320d]:5683] (id=17:A1:DE:6E:DA:9F): Disconnected event notification received; Triggering catch-up poll
2022-11-29 10:26:34.728 DEBUG (MainThread) [aiohomekit.zeroconf] 17:A1:DE:6E:DA:9F: Device rediscovered
2022-11-29 10:26:34.743 DEBUG (MainThread) [aiohomekit.controller.coap.connection] Pair verify uri=coap://[fdc5:366d:c048:0:abeb:84d3:ee31:320d]:5683/2
2022-11-29 10:26:34.747 DEBUG (MainThread) [aiohomekit.protocol.tlv] sending [
6 (State): (1 bytes/<class 'bytearray'>) 0x01
3 (PublicKey): (32 bytes/<class 'bytes'>) b'j\xe0\x88&\x8bijd\xe3\x87\xc4,\x9dx\x10\xa7\xe8y\xa30\xb4\x9aD\x8e\x83\x90T;\xc6%\x80\x05'
]

Edit: In my .storage/core.config_entries file I noticed that the two Nanoleaf bulbs that are working have a “source” of zeroconf and the one that’s not working has a source of “bluetooth”. The two working bulbs have correct data[“AccessoryIP”] and the non-working bulb has that old(?)/invalid fdc5 address. This all despite the fact that the “data” dictionary for all three has a Connection of CoAP. I changed the one bulb that’s not working to a source of zeroconf (+ restarted HASS) and that didn’t change anything. So I decided to just take the direct route and update its AccessoryIP to the correct IP. It immediately connected upon a HASS reboot.

So, why are updated IPv6 addresses sometimes not making it into core.config_entries? Even when they’re advertising their new IPs via zeroconf and aiohomekit sees that they were rediscovered which causes the device/pairing description to be changed, which includes the device’s IP/port? (aiohomekit/zeroconf.py at main · Jc2k/aiohomekit · GitHub)

I’m honestly not even sure at this point if this is related to other issues described here. I’m at a loss so I’ma just at @Jc2k here to see if I should open a github issue or if there’s more info I can contribute or what.

I noticed a PR titled “get rid of reconnect_soon” yet it’s still getting called by CoAPPairing’s _async_endpoint_changed? Dunno if that’s related?

If it helps I can consistently replicate this by reverting core.config_entires to have the wrong AcessoryIP for that bulb.

Time to sleep.

Can you get a tcpdump of both border routers advertising routes? (See HomeKit Accessory Protocol (HAP) over CoAP/UDP (was: Nanoleaf Essentials bulb via Thread/CoAP) - #257 by Jc2k).

And can I see the route table from your HA instance. ip -6 route and ip -6 a would be great.

And can you use an mDNS service browser tool to see what the bulbs are advertising? It might be they are advertising multiple addresses. We filter out a bunch that are obviously wrong, but there could be more…

This is all on HASS OS so there’s a lot of Docker noise. This is from right on the HASS server itself, using the SSH & Web Terminal addon.

Both border routers sending advertisements:

➜  config sudo tcpdump -vvvv -ttt -i enp0s18 icmp6 and 'ip6[40] = 134'
tcpdump: listening on enp0s18, link-type EN10MB (Ethernet), snapshot length 262144 bytes
 00:00:00.000000 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 112) fe80::20e:c4ff:fed2:8995 > ip6-allnodes: [icmp6 sum ok] ICMP6, router advertisement, length 112
        hop limit 64, Flags [managed, other stateful], pref high, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
          prefix info option (3), length 32 (4): 2600:1700:8463:d1a0::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s
            0x0000:  40c0 0001 5180 0000 3840 0000 0000 2600
            0x0010:  1700 8463 d1a0 0000 0000 0000 0000
          rdnss option (25), length 24 (3):  lifetime 600s, addr: fw.tolstyak.net
            0x0000:  0000 0000 0258 2600 1700 8463 d1a0 0000
            0x0010:  0000 0000 0001
          dnssl option (31), length 24 (3):  lifetime 600s, domain(s): tolstyak.net.
            0x0000:  0000 0000 0258 0874 6f6c 7374 7961 6b03
            0x0010:  6e65 7400 0000
          mtu option (5), length 8 (1):  1500
            0x0000:  0000 0000 05dc
          source link-address option (1), length 8 (1): 00:0e:c4:d2:89:95
            0x0000:  000e c4d2 8995
 00:00:07.210061 IP6 (flowlabel 0x40000, hlim 255, next-header ICMPv6 (58) payload length: 72) fe80::1ca9:85d:5992:1da3 > ip6-allnodes: [icmp6 sum ok] ICMP6, router advertisement, length 72
        hop limit 0, Flags [none], pref medium, router lifetime 0s, reachable time 0ms, retrans timer 0ms
          source link-address option (1), length 8 (1): 94:ea:32:8c:6b:b6
            0x0000:  94ea 328c 6bb6
          prefix info option (3), length 32 (4): fd9a:793f:4de6:4fac::/64, Flags [onlink, auto], valid time 1800s, pref. time 1800s
            0x0000:  40c0 0000 0708 0000 0708 0000 0000 fd9a
            0x0010:  793f 4de6 4fac 0000 0000 0000 0000
          route info option (24), length 16 (2):  fd6a:ab22:4825::/64, pref=medium, lifetime=1800s
            0x0000:  4000 0000 0708 fd6a ab22 4825 0000
 00:00:11.944192 IP6 (flowlabel 0x20d00, hlim 255, next-header ICMPv6 (58) payload length: 40) fe80::45b:9bf6:27e9:bd84 > ip6-allnodes: [icmp6 sum ok] ICMP6, router advertisement, length 40
        hop limit 0, Flags [none], pref medium, router lifetime 0s, reachable time 0ms, retrans timer 0ms
          source link-address option (1), length 8 (1): a8:51:ab:91:7d:e7
            0x0000:  a851 ab91 7de7
          route info option (24), length 16 (2):  fd6a:ab22:4825::/64, pref=medium, lifetime=1800s
            0x0000:  4000 0000 0708 fd6a ab22 4825 0000
 00:00:02.548996 IP6 (flowlabel 0x40000, hlim 255, next-header ICMPv6 (58) payload length: 72) fe80::1ca9:85d:5992:1da3 > ip6-allnodes: [icmp6 sum ok] ICMP6, router advertisement, length 72
        hop limit 0, Flags [none], pref medium, router lifetime 0s, reachable time 0ms, retrans timer 0ms
          source link-address option (1), length 8 (1): 94:ea:32:8c:6b:b6
            0x0000:  94ea 328c 6bb6
          prefix info option (3), length 32 (4): fd9a:793f:4de6:4fac::/64, Flags [onlink, auto], valid time 1800s, pref. time 1800s
            0x0000:  40c0 0000 0708 0000 0708 0000 0000 fd9a
            0x0010:  793f 4de6 4fac 0000 0000 0000 0000
          route info option (24), length 16 (2):  fd6a:ab22:4825::/64, pref=medium, lifetime=1800s
            0x0000:  4000 0000 0708 fd6a ab22 4825 0000
 00:00:09.317836 IP6 (flowlabel 0x40000, hlim 255, next-header ICMPv6 (58) payload length: 72) fe80::1ca9:85d:5992:1da3 > ip6-allnodes: [icmp6 sum ok] ICMP6, router advertisement, length 72
        hop limit 0, Flags [none], pref medium, router lifetime 0s, reachable time 0ms, retrans timer 0ms
          source link-address option (1), length 8 (1): 94:ea:32:8c:6b:b6
            0x0000:  94ea 328c 6bb6
          prefix info option (3), length 32 (4): fd9a:793f:4de6:4fac::/64, Flags [onlink, auto], valid time 1800s, pref. time 1800s
            0x0000:  40c0 0000 0708 0000 0708 0000 0000 fd9a
            0x0010:  793f 4de6 4fac 0000 0000 0000 0000
          route info option (24), length 16 (2):  fd6a:ab22:4825::/64, pref=medium, lifetime=1800s
            0x0000:  4000 0000 0708 fd6a ab22 4825 0000

Both do make it into the routing table:

➜  config route -A inet6 | grep fd6a
fd6a:ab22:4825::/64                         fe80::1ca9:85d:5992:1da3                UG    100    7        0 enp0s18
fd6a:ab22:4825::/64                         fe80::45b:9bf6:27e9:bd84                UG    100    5        0 enp0s18

ip -6 route doesn’t show both routes for some reason, or even the next hop

➜  config ip -6 route
::1 dev lo  metric 256
2600:1700:8463:d1a0::1:14b4 dev enp0s18  metric 100
2600:1700:8463:d1a0::/64 dev enp0s18  metric 100
fd6a:ab22:4825::/64  metric 100
fd9a:793f:4de6:4fac::/64 dev enp0s18  metric 100
anycast fe80:: dev veth5576156  metric 0
anycast fe80:: dev hassio  metric 0
anycast fe80:: dev veth0df6492  metric 0
anycast fe80:: dev veth40518c8  metric 0
anycast fe80:: dev docker0  metric 0
anycast fe80:: dev veth197b5e7  metric 0
anycast fe80:: dev vethfd70de0  metric 0
anycast fe80:: dev vethf470e81  metric 0
anycast fe80:: dev veth7cbaa02  metric 0
anycast fe80:: dev veth0d06e77  metric 0
anycast fe80:: dev veth48c645b  metric 0
anycast fe80:: dev veth14281bd  metric 0
anycast fe80:: dev veth027d525  metric 0
anycast fe80:: dev veth754f881  metric 0
fe80::/64 dev enp0s18  metric 100
fe80::/64 dev veth5576156  metric 256
fe80::/64 dev hassio  metric 256
fe80::/64 dev veth0df6492  metric 256
fe80::/64 dev docker0  metric 256
fe80::/64 dev veth40518c8  metric 256
fe80::/64 dev veth197b5e7  metric 256
fe80::/64 dev vethfd70de0  metric 256
fe80::/64 dev vethf470e81  metric 256
fe80::/64 dev veth7cbaa02  metric 256
fe80::/64 dev veth0d06e77  metric 256
fe80::/64 dev veth48c645b  metric 256
fe80::/64 dev veth14281bd  metric 256
fe80::/64 dev veth027d525  metric 256
fe80::/64 dev veth754f881  metric 256
multicast ff00::/8 dev veth5576156  metric 256
multicast ff00::/8 dev hassio  metric 256
multicast ff00::/8 dev veth0df6492  metric 256
multicast ff00::/8 dev docker0  metric 256
multicast ff00::/8 dev veth40518c8  metric 256
multicast ff00::/8 dev enp0s18  metric 256
multicast ff00::/8 dev veth197b5e7  metric 256
multicast ff00::/8 dev vethfd70de0  metric 256
multicast ff00::/8 dev vethf470e81  metric 256
multicast ff00::/8 dev veth7cbaa02  metric 256
multicast ff00::/8 dev veth0d06e77  metric 256
multicast ff00::/8 dev veth48c645b  metric 256
multicast ff00::/8 dev veth14281bd  metric 256
multicast ff00::/8 dev veth027d525  metric 256
multicast ff00::/8 dev veth754f881  metric 256
default via fe80::20e:c4ff:fed2:8995 dev enp0s18  metric 100

IPv6 addresses:

➜  config ip -6 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 2600:1700:8463:d1a0::1:14b4/128 scope global dynamic noprefixroute
       valid_lft 5663sec preferred_lft 2963sec
    inet6 fd9a:793f:4de6:4fac:37e9:7b9d:f4f7:a275/64 scope global deprecated dynamic noprefixroute
       valid_lft 1167sec preferred_lft 0sec
    inet6 2600:1700:8463:d1a0:27c:5eb0:7abd:628a/64 scope global dynamic noprefixroute
       valid_lft 86202sec preferred_lft 14202sec
    inet6 fe80::1ad:26ee:2537:df0f/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: hassio: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP
    inet6 fe80::42:beff:fe4f:34a5/64 scope link
       valid_lft forever preferred_lft forever
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP
    inet6 fe80::42:41ff:fe91:eb2d/64 scope link
       valid_lft forever preferred_lft forever
6: veth5576156@if5: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 state UP
    inet6 fe80::c029:fdff:fee5:bda2/64 scope link
       valid_lft forever preferred_lft forever
8: veth0df6492@if7: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 state UP
    inet6 fe80::3c00:84ff:feb3:74e0/64 scope link
       valid_lft forever preferred_lft forever
10: veth40518c8@if9: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 state UP
    inet6 fe80::e48c:35ff:fe1a:8069/64 scope link
       valid_lft forever preferred_lft forever
12: veth197b5e7@if11: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 state UP
    inet6 fe80::30ad:8cff:fe41:1e49/64 scope link
       valid_lft forever preferred_lft forever
14: vethfd70de0@if13: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 state UP
    inet6 fe80::e4f2:13ff:fe89:4020/64 scope link
       valid_lft forever preferred_lft forever
16: vethf470e81@if15: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 state UP
    inet6 fe80::4844:35ff:fe5f:18e8/64 scope link
       valid_lft forever preferred_lft forever
18: veth7cbaa02@if17: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 state UP
    inet6 fe80::8d6:e0ff:fec4:b84/64 scope link
       valid_lft forever preferred_lft forever
20: veth0d06e77@if19: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 state UP
    inet6 fe80::58e0:91ff:fe60:d581/64 scope link
       valid_lft forever preferred_lft forever
22: veth48c645b@if21: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 state UP
    inet6 fe80::74ce:41ff:febd:92bf/64 scope link
       valid_lft forever preferred_lft forever
24: veth14281bd@if23: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 state UP
    inet6 fe80::1092:54ff:fe9d:93db/64 scope link
       valid_lft forever preferred_lft forever
26: veth027d525@if25: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 state UP
    inet6 fe80::9cb7:38ff:feca:653f/64 scope link
       valid_lft forever preferred_lft forever
28: veth754f881@if27: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 state UP
    inet6 fe80::489f:41ff:fe25:e120/64 scope link
       valid_lft forever preferred_lft forever

When I say I can reach the bulb that was non-working via both border routers, I mean that I have this script that inserts temporary low-metric routes to force pings through each border router. (It removes them after.) And I was getting ping replies no matter what.

#!/bin/bash

bulbs="0t1c"
#bulbs="0t1c 2rjd 3rxf"
TBRs=$(route -A inet6 | grep fd6a | sed 's/ \+/ /g' | cut -d ' ' -f 2)

echo "Bulbs: $bulbs"
echo "TBRs:
$TBRs"
echo

for x in $bulbs; do
        echo "$x"
        ip=$(ping -c1 -W1 "nanoleaf-a19-$x.local" | grep PING | cut -d '(' -f 2 | cut -d ')' -f 1)
        [ $? -ne 0 ] && echo "No ping" || echo "Pings"
        ip -6 route get "$ip"

        echo

        while IFS= read -r line; do
                echo "Via $line..."
                ip -6 route add "$ip" via "$line" metric 1 dev enp0s18
                ip -6 route get "$ip"
                ping -c4 "$ip"
                ip -6 route del "$ip" via "$line" metric 1 dev enp0s18
        done <<< "$TBRs"
done

Its output. I did also confirm via tcpdump that the ping packets go through the expected border router when the temporary routes are in place.

➜  share ./thread.sh
Bulbs: 0t1c
TBRs:
fe80::1ca9:85d:5992:1da3
fe80::45b:9bf6:27e9:bd84

0t1c
Pings
fd6a:ab22:4825:0:2e8:a408:94b7:7fd via fe80::1ca9:85d:5992:1da3 dev enp0s18  src 2600:1700:8463:d1a0::1:14b4  metric 100

Via fe80::1ca9:85d:5992:1da3...
fd6a:ab22:4825:0:2e8:a408:94b7:7fd via fe80::1ca9:85d:5992:1da3 dev enp0s18  src 2600:1700:8463:d1a0::1:14b4  metric 1
PING fd6a:ab22:4825:0:2e8:a408:94b7:7fd (fd6a:ab22:4825:0:2e8:a408:94b7:7fd): 56 data bytes
64 bytes from fd6a:ab22:4825:0:2e8:a408:94b7:7fd: seq=0 ttl=63 time=32.223 ms
64 bytes from fd6a:ab22:4825:0:2e8:a408:94b7:7fd: seq=1 ttl=63 time=26.495 ms
64 bytes from fd6a:ab22:4825:0:2e8:a408:94b7:7fd: seq=2 ttl=63 time=28.608 ms
64 bytes from fd6a:ab22:4825:0:2e8:a408:94b7:7fd: seq=3 ttl=63 time=28.686 ms

--- fd6a:ab22:4825:0:2e8:a408:94b7:7fd ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 26.495/29.003/32.223 ms
Via fe80::45b:9bf6:27e9:bd84...
fd6a:ab22:4825:0:2e8:a408:94b7:7fd via fe80::45b:9bf6:27e9:bd84 dev enp0s18  src 2600:1700:8463:d1a0::1:14b4  metric 1
PING fd6a:ab22:4825:0:2e8:a408:94b7:7fd (fd6a:ab22:4825:0:2e8:a408:94b7:7fd): 56 data bytes
64 bytes from fd6a:ab22:4825:0:2e8:a408:94b7:7fd: seq=0 ttl=63 time=39.257 ms
64 bytes from fd6a:ab22:4825:0:2e8:a408:94b7:7fd: seq=1 ttl=63 time=34.213 ms
64 bytes from fd6a:ab22:4825:0:2e8:a408:94b7:7fd: seq=2 ttl=63 time=24.812 ms
64 bytes from fd6a:ab22:4825:0:2e8:a408:94b7:7fd: seq=3 ttl=63 time=24.840 ms

--- fd6a:ab22:4825:0:2e8:a408:94b7:7fd ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 24.812/30.780/39.257 ms

2600:1700:8463:d1a0::/64 is my LAN
2600:1700:8463:d1a0::1:xxxx is DHCPv6
fd6a:ab22:4825::/64 is apparently my thread network
Not sure where the fd9a ULAs are coming from.

The underlying IPv6 connectivity seems to be perfectly fine, and the bulb advertises only its current IPv6 address. This is also evidenced in the logs of my previous post, where the description object shows the proper address in both the ‘address’ field and shows only that one, correct address in the ‘addresses’ list.

(_hap._udp.)

From above:

2022-11-29 10:26:34.726 DEBUG (MainThread) [aiohomekit.controller.abstract] [[fdc5:366d:c048:0:abeb:84d3:ee31:320d]:5683] (id=17:A1:DE:6E:DA:9F): Description updated: old=None new=HomeKitService(name='Nanoleaf A19 0T1C', id='17:a1:de:6e:da:9f', model='NL45', feature_flags=<FeatureFlags.SUPPORTS_SOFTWARE_AUTHENTICATION: 2>, status_flags=<StatusFlags.0: 0>, config_num=1, state_num=8, category=<Categories.LIGHTBULB: 5>, protocol_version='1.2', type='_hap._udp.local.', address='fd6a:ab22:4825:0:2e8:a408:94b7:7fd', addresses=['fd6a:ab22:4825:0:2e8:a408:94b7:7fd'], port=5683)

To demonstrate that those router advertisements are indeed my Apple border routers (_meshcop._udp.):
ATV:

HomePod Mini:

Everything is on the same VLAN. IPv6 connectivity is working fine and aiohomekit sees the zeroconf advertisements. Seems like it’s just not making use of the new IPv6 address that it receives. Once I manually updated the core.config_entries file in .storage that aiohomekit pulls pair info from it started working, and if I change the AccessoryIP of the bulb back to what it was in that file, it goes back to not working and not making use of the proper IP from zeroconf.

1 Like

Thanks for the detailed replies.

Just to be clear, what happens if you don’t have that script running?

It doesn’t run constantly. I just run it whenever I’m troubleshooting to test whether I can reach the bulb/s over IPv6, and it cleans up after itself.

Given what you have provided, I can only assume the fdc5 address is indeed an old one.

I think the reconnect_soon PR is unrelated. Some of the internal methods are indeed still called reconnect_soon, that PR was about removing it from the public API. You see, it used to be that we piggy-backed on HAs zeroconf` integration and the config flow subsystem to get address change notifications. Whilst that worked, it actually meant lots of config flows were started and cancelled. And when we added BLE, it was an abstraction break that needed to be hidden inside aiohomekit.

It looks like there is a bug in reconnect_soon / _async_endpoint_changed where the address the CoAP code is using isn’t updated unless there is a valid encryption context. Obviously if the ip address is wrong at startup that will never work. Can you change aiohomekit/controller/coap/connection.py? There should be a method like this:

    async def reconnect_soon(self):
        if not self.enc_ctx:
            return
        description = self.owner.description
        self.address = f"[{description.address}]:{description.port}"
        await self.enc_ctx.coap_ctx.shutdown()
        self.enc_ctx = None
        # XXX can't .connect here w/o pairing_data

This isn’t the final fix, but to test my hunch can you make it look like this:

    async def reconnect_soon(self):
        description = self.owner.description
        self.address = f"[{description.address}]:{description.port}"
        if not self.enc_ctx:
            return
        await self.enc_ctx.coap_ctx.shutdown()
        self.enc_ctx = None
        # XXX can't .connect here w/o pairing_data

(And start with the wrong ip address in your core.config_entries)

Yep, that fixes it. The self.name of the pair does start out as the old fdc5 address, presumably because that bulb had that IP when it was originally paired to HASS and so that’s what is in core.config_entries, but once that zeroconf advertisement is received it initiates the CoAP connection to the proper, new IP.

2022-11-30 17:09:58.447 DEBUG (MainThread) [aiohomekit.controller.abstract] [[fdc5:366d:c048:0:abeb:84d3:ee31:320d]:5683] (id=17:A1:DE:6E:DA:9F): Description updated: old=None new=HomeKitService(name='Nanoleaf A19 0T1C', id='17:a1:de:6e:da:9f', model='NL45', feature_flags=<FeatureFlags.SUPPORTS_SOFTWARE_AUTHENTICATION: 2>, status_flags=<StatusFlags.0: 0>, config_num=1, state_num=13, category=<Categories.LIGHTBULB: 5>, protocol_version='1.2', type='_hap._udp.local.', address='fd6a:ab22:4825:0:2e8:a408:94b7:7fd', addresses=['fd6a:ab22:4825:0:2e8:a408:94b7:7fd'], port=5683)
2022-11-30 17:09:58.447 DEBUG (MainThread) [aiohomekit.controller.abstract] [[fdc5:366d:c048:0:abeb:84d3:ee31:320d]:5683] (id=17:A1:DE:6E:DA:9F): Disconnected event notification received; Triggering catch-up poll
...
2022-11-30 17:09:59.220 DEBUG (MainThread) [aiohomekit.controller.coap.connection] Pair verify uri=coap://[fd6a:ab22:4825:0:2e8:a408:94b7:7fd]:5683/2

So you’ve got it nailed on the aiohomekit side. On the HASS side, should that core.config_entries file be getting updated as things change in the field? Or is it intentionally a “this is how it was when it was set up” kind of thing? 'cause the way things are that bulb is always going to start off with the wrong IP.

There is code to update that in HA, it’s another case where our abstractions were broken by BLE and CoAP, so it’s still part of the config flow for now. This is where it should get saved:

If you could stick some prints or logs in that function to see if it gets called and/or if it bails early that would be a big help.

Looks like the HKID is the MAC of the bulb (edit: maybe not?), which is handy. async_step_zeroconf does get called for all my Nanoleaf bulbs, but prior to that in the logs is this (bulb is 17:A1:DE:6E:DA:9F):

2022-12-01 11:15:44.321 DEBUG (MainThread) [homeassistant.components.homekit_controller.storage] Creating or updating entity map for 17:A1:DE:6E:DA:9F
2022-12-01 11:15:44.325 DEBUG (MainThread) [homeassistant.components.homekit_controller.connection] Migrating device registry entries for pairing 17:A1:DE:6E:DA:9F
2022-12-01 11:15:44.325 DEBUG (MainThread) [homeassistant.components.homekit_controller.connection] Removing legacy serial numbers from device registry entries for pairing 17:A1:DE:6E:DA:9F
2022-12-01 11:15:44.352 DEBUG (MainThread) [homeassistant.components.homekit_controller.connection] Checking if unique ID homekit-N21280B2R81-aid:1-sid:112-cid:115 on sensor needs to be migrated
2022-12-01 11:15:44.352 DEBUG (MainThread) [homeassistant.components.homekit_controller.connection] Unique ID homekit-N21280B2R81-aid:1-sid:112-cid:115 does not need to be migrated
2022-12-01 11:15:44.353 DEBUG (MainThread) [homeassistant.components.homekit_controller.connection] Checking if unique ID homekit-N21280B2R81-aid:1-sid:112-cid:117 on sensor needs to be migrated
2022-12-01 11:15:44.353 DEBUG (MainThread) [homeassistant.components.homekit_controller.connection] Unique ID homekit-N21280B2R81-aid:1-sid:112-cid:117 does not need to be migrated
2022-12-01 11:15:44.353 DEBUG (MainThread) [homeassistant.components.homekit_controller.connection] Checking if unique ID homekit-N21280B2R81-aid:1-sid:1-cid:2 on button needs to be migrated
2022-12-01 11:15:44.353 DEBUG (MainThread) [homeassistant.components.homekit_controller.connection] Unique ID homekit-N21280B2R81-aid:1-sid:1-cid:2 does not need to be migrated
2022-12-01 11:15:44.354 DEBUG (MainThread) [homeassistant.components.homekit_controller.connection] Checking if unique ID homekit-N21280B2R81-48 on light needs to be migrated
2022-12-01 11:15:44.354 DEBUG (MainThread) [homeassistant.components.homekit_controller.connection] Unique ID homekit-N21280B2R81-48 does not need to be migrated
...
2022-12-01 11:15:45.094 DEBUG (MainThread) [homeassistant.components.homekit_controller.connection] Called async_set_available_state with True for 17:A1:DE:6E:DA:9F

Changed that code block to:

        # If the device is already paired and known to us we should monitor c#
        # (config_num) for changes. If it changes, we check for new entities
        _LOGGER.debug("!! paired: " + str(paired))
        _LOGGER.debug("!! hkid: " + str(hkid))
        if paired and hkid in self.hass.data.get(KNOWN_DEVICES, {}):
            _LOGGER.debug("!! exist: " + str(existing_entry))
            if existing_entry:
                from pprint import pprint
                pprint(existing_entry)
                pprint(updated_ip_port)
                self.hass.config_entries.async_update_entry(
                    existing_entry, data={**existing_entry.data, **updated_ip_port}
                )
            return self.async_abort(reason="already_configured")

        _LOGGER.debug("Discovered device %s (%s - %s)", name, model, hkid)

And got this:

2022-12-01 11:16:06.886 DEBUG (MainThread) [homeassistant.components.homekit_controller.config_flow] !! paired: True
2022-12-01 11:16:06.886 DEBUG (MainThread) [homeassistant.components.homekit_controller.config_flow] !! hkid: 17:A1:DE:6E:DA:9F
2022-12-01 11:16:06.887 DEBUG (MainThread) [homeassistant.components.homekit_controller.config_flow] !! exist: None

So for some reason this is returning None and preventing the later async_update_entry

        # Set unique-id and error out if it's already configured
        existing_entry = await self.async_set_unique_id(
            normalized_hkid, raise_on_progress=False
        )

Edit: added a debug line to check updated_ip_port and it does get the proper IP.

Edit 2: async_set_unique_id in homeassistant/home_entries.py loops through self._async_current_entries which never seems to include the bulb that starts off with the wrong IP from core.config_entries. This is even when the bulb is discovered and works properly with that quick hack of the CoAP reconnect_soon function.

Edit 3: I’m guessing it’s because the unique_id for that bulb in my core.config_entries file is “F7:E2:F2:08:F3:D3”. If I were a betting man I’d say that it paired via Bluetooth first and got that ID from there. That F7:E2:F2 ID does show up in self._async_current_entries in home_entries.py.

Here’s the full entry for that bulb, with the wrong IP and wrong unique_id left as they were:

      {
        "entry_id": "0a9b1ff20b36c33e49af7d9a7081d779",
        "version": 1,
        "domain": "homekit_controller",
        "title": "Office Bulb",
        "data": {
          "AccessoryPairingID": "17:A1:DE:6E:DA:9F",
          "AccessoryLTPK": "3b0f340dd69589dcd1b3ce72e2bd013877cd92f586cb6f8418ac9d965fcc885e",
          "iOSPairingId": "5730d09e-93d6-47a7-8db2-49d9c34a23fe",
          "iOSDeviceLTSK": "30f196b42557ce125591b5001aa77bc925e87665754cc61985f956e26886d09d",
          "iOSDeviceLTPK": "3168376d03336750abe7bc76e206ea57b53026b1d1c405c7e4b78a06a9c04910",
          "AccessoryIP": "fdc5:366d:c048:0:abeb:84d3:ee31:320d",
          "AccessoryPort": 5683,
          "Connection": "CoAP"
        },
        "options": {},
        "pref_disable_new_entities": false,
        "pref_disable_polling": false,
        "source": "bluetooth",
        "unique_id": "F7:E2:F2:08:F3:D3",
        "disabled_by": null
      },

Edit number ?: I changed the unique_id of that bulb in core.config_entries to match the one being advertised via zeroconf, and I left the earlier bug workaround in aiohomekit in place. After a HASS restart, that bulb is discovered, I’m able to control it, and it now has the correct AccessoryIP in core.config_entries.

Whew!

E: For the sake of completeness, undoing the aiohomekit workaround in reconnect_soon and changing the unique_id in core.config_entries for that bulb to match what’s being advertised over Thread/zeroconf causes the IP in core.config_entries to be updated, which eventually makes its way back to aiohomekit and causes that bulb to come online.

So, it would seem that a HAP device that paired over Bluetooth first and then later is discovered via Thread is not having its IP/config changes tracked by HASS because the unique_id/HKID differs between Bluetooth and Thread. This is in addition to aiohomekit not tracking Thread device IP changes due to a now-known bug, although that doesn’t rear its head if HASS eventually feeds it the right IP.

I believe this is the issue I ran into with my pairing process weeks ago. I had paired them up with aiohomekit after disconnecting from apple homekit on thread. However, I had the Bluetooth integration enabled and didn’t realize that some of them had not migrated over to thread or somehow they got kicked off thread and paired with Bluetooth. Once I restarted my home assistant and turned off the Bluetooth integration, a certain # of bulbs were disconnected and did not show up on hap.udp.

I hope your discover helps figure out a certain issue going on. I feel like we are very close to having a stable integration.

One more question I have for the code owners is: When I have the physicial power shut off my bulb and then turn the power back on, why does home assistant take so long to recognize when the bulb comes back online? When the Nanoleaf bulb is connected via apple HomeKit the pairing and recognition of the bulb being connected is way faster. Can anyone look into this? Thank you all for what you do. I am just a person with a hobby for home automation with no coding knowledge.

Just installed new beta and it broke all my connections :frowning: had to revert back to 2022.12.b0

Same issue, latest beta made all my Thread lights stop working…

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/config_entries.py", line 372, in async_setup
    result = await component.async_setup_entry(hass, self)
  File "/usr/src/homeassistant/homeassistant/components/homekit_controller/__init__.py", line 32, in async_setup_entry
    conn = HKDevice(hass, entry, entry.data)
  File "/usr/src/homeassistant/homeassistant/components/homekit_controller/connection.py", line 87, in __init__
    self.pairing = connection.load_pairing(self.unique_id, self.pairing_data)
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/controller.py", line 159, in load_pairing
    if pairing := transport.load_pairing(alias, pairing_data):
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/controller.py", line 32, in load_pairing
    pairing._async_description_update(discovery.description)
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/zeroconf.py", line 197, in _async_description_update
    self._async_endpoint_changed()
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/pairing.py", line 53, in _async_endpoint_changed
    description = self.owner.description
AttributeError: 'CoAPPairing' object has no attribute 'owner'

So, I installed an update to my homepod minis last night and woke up to a good majority of my devices being kicked off the thread network? Why does this happen, even when my appletv 4k was set as the homehub?




2022-12-02 11:38:17.200 WARNING (MainThread) [aiohomekit.controller.coap.connection] Pair verify timed out
2022-12-02 11:38:35.465 DEBUG (MainThread) [homeassistant.components.homekit_controller.config_flow] Discovered device Nanoleaf A19 9ZWW (NL45 - A1:73:E3:B8:9F:8E)
2022-12-02 11:38:35.466 DEBUG (MainThread) [homeassistant.components.homekit_controller.config_flow] Discovered device Nanoleaf A19 9ZWW (NL45 - A1:73:E3:B8:9F:8E)
2022-12-02 11:38:35.864 DEBUG (MainThread) [aiohomekit.zeroconf] a1:73:e3:b8:9f:8e: Notifying pairing of description update: HomeKitService(name='Nanoleaf A19 9ZWW', id='a1:73:e3:b8:9f:8e', model='NL45', feature_flags=<FeatureFlags.SUPPORTS_SOFTWARE_AUTHENTICATION: 2>, status_flags=<StatusFlags.0: 0>, config_num=1, state_num=18, category=<Categories.LIGHTBULB: 5>, protocol_version='1.2', type='_hap._udp.local.', address='fd4a:6ddf:9d7c:0:2826:ac7f:f55:31a2', addresses=['fd4a:6ddf:9d7c:0:2826:ac7f:f55:31a2'], port=5683)
2022-12-02 11:38:35.865 DEBUG (MainThread) [aiohomekit.controller.abstract] Nanoleaf A19 9ZWW [[fd4a:6ddf:9d7c:0:2826:ac7f:f55:31a2]:5683] (id=A1:73:E3:B8:9F:8E): Description updated: old=HomeKitService(name='Nanoleaf A19 9ZWW', id='a1:73:e3:b8:9f:8e', model='NL45', feature_flags=<FeatureFlags.SUPPORTS_SOFTWARE_AUTHENTICATION: 2>, status_flags=<StatusFlags.0: 0>, config_num=1, state_num=16, category=<Categories.LIGHTBULB: 5>, protocol_version='1.2', type='_hap._udp.local.', address='fd4a:6ddf:9d7c:0:2826:ac7f:f55:31a2', addresses=['fd4a:6ddf:9d7c:0:2826:ac7f:f55:31a2'], port=5683) new=HomeKitService(name='Nanoleaf A19 9ZWW', id='a1:73:e3:b8:9f:8e', model='NL45', feature_flags=<FeatureFlags.SUPPORTS_SOFTWARE_AUTHENTICATION: 2>, status_flags=<StatusFlags.0: 0>, config_num=1, state_num=18, category=<Categories.LIGHTBULB: 5>, protocol_version='1.2', type='_hap._udp.local.', address='fd4a:6ddf:9d7c:0:2826:ac7f:f55:31a2', addresses=['fd4a:6ddf:9d7c:0:2826:ac7f:f55:31a2'], port=5683)
2022-12-02 11:38:35.865 DEBUG (MainThread) [aiohomekit.controller.abstract] Nanoleaf A19 9ZWW [[fd4a:6ddf:9d7c:0:2826:ac7f:f55:31a2]:5683] (id=A1:73:E3:B8:9F:8E): Disconnected event notification received; Triggering catch-up poll
2022-12-02 11:38:43.374 DEBUG (MainThread) [aiohomekit.controller.abstract] [[fd4a:6ddf:9d7c:0:2826:ac7f:f55:31a2]:5683] (id=A1:73:E3:B8:9F:8E): Accessories cache loaded (c#: 1) (has broadcast_key: False)
2022-12-02 11:38:43.375 DEBUG (MainThread) [aiohomekit.controller.abstract] [[fd4a:6ddf:9d7c:0:2826:ac7f:f55:31a2]:5683] (id=A1:73:E3:B8:9F:8E): Description updated: old=None new=HomeKitService(name='Nanoleaf A19 9ZWW', id='a1:73:e3:b8:9f:8e', model='NL45', feature_flags=<FeatureFlags.SUPPORTS_SOFTWARE_AUTHENTICATION: 2>, status_flags=<StatusFlags.0: 0>, config_num=1, state_num=18, category=<Categories.LIGHTBULB: 5>, protocol_version='1.2', type='_hap._udp.local.', address='fd4a:6ddf:9d7c:0:2826:ac7f:f55:31a2', addresses=['fd4a:6ddf:9d7c:0:2826:ac7f:f55:31a2'], port=5683)
2022-12-02 11:38:43.375 DEBUG (MainThread) [aiohomekit.controller.abstract] [[fd4a:6ddf:9d7c:0:2826:ac7f:f55:31a2]:5683] (id=A1:73:E3:B8:9F:8E): Disconnected event notification received; Triggering catch-up poll
2022-12-02 11:38:43.378 DEBUG (MainThread) [aiohomekit.zeroconf] A1:73:E3:B8:9F:8E: Device rediscovered
2022-12-02 11:38:43.378 ERROR (MainThread) [homeassistant.config_entries] Error setting up entry Nanoleaf A19 9ZWW for homekit_controller
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/config_entries.py", line 372, in async_setup
    result = await component.async_setup_entry(hass, self)
  File "/usr/src/homeassistant/homeassistant/components/homekit_controller/__init__.py", line 32, in async_setup_entry
    conn = HKDevice(hass, entry, entry.data)
  File "/usr/src/homeassistant/homeassistant/components/homekit_controller/connection.py", line 87, in __init__
    self.pairing = connection.load_pairing(self.unique_id, self.pairing_data)
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/controller.py", line 159, in load_pairing
    if pairing := transport.load_pairing(alias, pairing_data):
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/controller.py", line 32, in load_pairing
    pairing._async_description_update(discovery.description)
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/zeroconf.py", line 197, in _async_description_update
    self._async_endpoint_changed()
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/pairing.py", line 53, in _async_endpoint_changed
    description = self.owner.description
AttributeError: 'CoAPPairing' object has no attribute 'owner'

Multi hub homekit/thread is notoriously bad atm, even with the official iOS client. If you look on homekit reddit there are posts where a thread network reshuffle (like when a border router or even a router go offline) can take the whole thread network offline for hours.

Some people think that nanoleaf products make this situation worse, but it’s also superstition at the moment.

1 Like

If you get tracebacks in the beta make sure to raise GitHub tickets, or I might not see them.

2 Likes

It’s not currently supported to pair with HA over BLE and then switch to Thread, only to pair over Thread. Without manually fixing the JSON the pairing can only be for the transport it was discovered as.

There is a ticket tracking work towards this in aiohomekit, though it is aimed at provisioning thread within HA itself it would fox this as well, though we lack developer time. There is a plan to migrate BLE to use the hkid instead of MAC for consistency.