HomeKit Accessory Protocol (HAP) over CoAP/UDP (was: Nanoleaf Essentials bulb via Thread/CoAP)

This doesn’t seem to have worked. net.ipv6.conf.all.accept_ra_rt_info_max_plen also isn’t available for me.

Do I need to add additional routes for the fd ULA here?

After waiting a bit longer, I now get this on the Unraid box:

br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.0.3  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fd3f:4d46:8539:4967:e63d:1aff:fe3d:e6f0  prefixlen 64  scopeid 0x0<global>
        inet6 2601:647:cc00:f4:e63d:1aff:fe3d:e6f0  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::e63d:1aff:fe3d:e6f0  prefixlen 64  scopeid 0x20<link>
        ether e4:3d:1a:3d:e6:f0  txqueuelen 1000  (Ethernet)
        RX packets 366110151  bytes 66446017608 (61.8 GiB)
        RX errors 0  dropped 98  overruns 0  frame 0
        TX packets 54980973  bytes 539622473251 (502.5 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

I would avoid static config of the fd ip ranges, as (a) we can easily mess up things like the scope and (b) I can’t say how often theyll change and (c) it might not work if you have multiple border routers.

I’ll have a think about what to try next when I’m at my desk.

To clarify, I didn’t add any static configs. In the screenshot, that fd route showed up automatically a while after setting net.ipv6.conf.br0.accept_ra=2. Similarly, the fd ipv6 address showed up automatically after a while.

Still no successful pings to the thread devices from the Unraid box, though

Be good to see the output of ip -6 route I guess. Also, how are you with tcpdump?

root@server:~# ip -6 route
::1 dev lo proto kernel metric 256 pref medium
2601:647:cc00:f4::/64 dev br0 proto ra metric 213 pref medium
2601:647:cc00:f4::/64 dev br0 proto kernel metric 256 expires 86277sec pref medium
fd3f:4d46:8539:4967::/64 dev br0 proto ra metric 213 pref medium
fd3f:4d46:8539:4967::/64 dev br0 proto kernel metric 256 expires 1754sec pref medium
fe80::/64 dev br0 proto kernel metric 256 pref medium
fe80::/64 dev bond0 proto kernel metric 256 pref medium
fe80::/64 dev vethe6ae3a9 proto kernel metric 256 pref medium
fe80::/64 dev docker0 proto kernel metric 256 pref medium
fe80::/64 dev vethf27ab3b proto kernel metric 256 pref medium
fe80::/64 dev vethda52dc1 proto kernel metric 256 pref medium
fe80::/64 dev vetha517c94 proto kernel metric 256 pref medium
fe80::/64 dev veth3427e67 proto kernel metric 256 pref medium
fe80::/64 dev vetha6bcd33 proto kernel metric 256 pref medium
fe80::/64 dev veth2523961 proto kernel metric 256 pref medium
fe80::/64 dev vetheeaea72 proto kernel metric 256 pref medium
fe80::/64 dev veth538d3c4 proto kernel metric 256 pref medium
fe80::/64 dev vethd44e3c3 proto kernel metric 256 pref medium
fe80::/64 dev veth1dea03c proto kernel metric 256 pref medium
multicast ff00::/8 dev br0 proto kernel metric 256 pref medium
multicast ff00::/8 dev bond0 proto kernel metric 256 pref medium
multicast ff00::/8 dev wg0 proto kernel metric 256 pref medium
multicast ff00::/8 dev vethe6ae3a9 proto kernel metric 256 pref medium
multicast ff00::/8 dev docker0 proto kernel metric 256 pref medium
multicast ff00::/8 dev vethf27ab3b proto kernel metric 256 pref medium
multicast ff00::/8 dev vethda52dc1 proto kernel metric 256 pref medium
multicast ff00::/8 dev vetha517c94 proto kernel metric 256 pref medium
multicast ff00::/8 dev veth3427e67 proto kernel metric 256 pref medium
multicast ff00::/8 dev vetha6bcd33 proto kernel metric 256 pref medium
multicast ff00::/8 dev veth2523961 proto kernel metric 256 pref medium
multicast ff00::/8 dev vetheeaea72 proto kernel metric 256 pref medium
multicast ff00::/8 dev veth538d3c4 proto kernel metric 256 pref medium
multicast ff00::/8 dev vethd44e3c3 proto kernel metric 256 pref medium
multicast ff00::/8 dev veth1dea03c proto kernel metric 256 pref medium
default via fe80::929a:4aff:fe31:cfa3 dev br0 proto ra metric 213 pref medium
default via fe80::929a:4aff:fe31:cfa3 dev br0 proto ra metric 1024 expires 1677sec hoplimit 64 pref medium

No experience with tcpdump

If we just look at br0 and not the container interfaces thats:

2601:647:cc00:f4::/64 dev br0 proto ra metric 213 pref medium
2601:647:cc00:f4::/64 dev br0 proto kernel metric 256 expires 86277sec pref medium
fd3f:4d46:8539:4967::/64 dev br0 proto ra metric 213 pref medium
fd3f:4d46:8539:4967::/64 dev br0 proto kernel metric 256 expires 1754sec pref medium
fe80::/64 dev br0 proto kernel metric 256 pref medium
multicast ff00::/8 dev br0 proto kernel metric 256 pref medium
default via fe80::929a:4aff:fe31:cfa3 dev br0 proto ra metric 213 pref medium
default via fe80::929a:4aff:fe31:cfa3 dev br0 proto ra metric 1024 expires 1677sec hoplimit 64 pref medium

No idea why yours has duplicates like that.

Mine is:

# ip -6 route
fd59:86c6:e5a5::/64 via fe80::1423:d0cd:3961:8e4c dev vlan101  metric 1024  expires 0sec
fd69:3e50:e165:4bf8::/64 dev vlan101  metric 256  expires 0sec
fe80::/64 dev vlan101  metric 256 
multicast ff00::/8 dev vlan101  metric 256 

So there are 2 ipv6 networks in play. The accept_ra change we made seems to have allowed your unraid box to see one of them (fd3f). But you are missing an equivalent to this route:

fd59:86c6:e5a5::/64 via fe80::1423:d0cd:3961:8e4c dev vlan101  metric 1024  expires 0sec

When i didn’t get this route it was because i didn’t have the sysctl net.ipv6.conf.all.accept_ra_rt_info_max_plen configured.

If you run something like this:

tcpdump -n -vvv -e -i br0 icmp6

You’ll eventually see something like this (be patient, it can be a while between broadcasts):

22:51:28.322971 04:99:b9:64:1b:eb > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 126: (flowlabel 0xb0400, hlim 255, next-header ICMPv6 (58) payload length: 72) fe80::1423:d0cd:3961:8e4c > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 72
        hop limit 0, Flags [none], pref medium, router lifetime 0s, reachable time 0ms, retrans timer 0ms
          source link-address option (1), length 8 (1): 04:99:b9:64:1b:eb
            0x0000:  0499 b964 1beb
          prefix info option (3), length 32 (4): fd69:3e50:e165:4bf8::/64, Flags [onlink, auto], valid time 1800s, pref. time 1800s
            0x0000:  40c0 0000 0708 0000 0708 0000 0000 fd69
            0x0010:  3e50 e165 4bf8 0000 0000 0000 0000
          route info option (24), length 16 (2):  fd59:86c6:e5a5::/64, pref=medium, lifetime=1800s
            0x0000:  4000 0000 0708 fd59 86c6 e5a5 0000

If you do see a record like this (the crucial bit is the “route info option”) then you your kernel is just dropping the hint from the border router.

Key thing is the router advertisements? I captured a few of those:

17:09:14.459085 f0:b3:ec:1e:52:1b > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 94: (flowlabel 0xc0700, hlim 255, next-header ICMPv6 (58) payload length: 40) fe80::1895:b54d:d176:6a82 > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 40
	hop limit 0, Flags [none], pref medium, router lifetime 0s, reachable time 0s, retrans time 0s
	  source link-address option (1), length 8 (1): f0:b3:ec:1e:52:1b
	    0x0000:  f0b3 ec1e 521b
	  route info option (24), length 16 (2):  fd5e:3f71:7692::/64, pref=medium, lifetime=1800s
	    0x0000:  4000 0000 0708 fd5e 3f71 7692 0000


17:09:13.548640 04:99:b9:73:9a:e4 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 94: (flowlabel 0x90f00, hlim 255, next-header ICMPv6 (58) payload length: 40) fe80::14d3:b385:2f65:9d64 > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 40
	hop limit 0, Flags [none], pref medium, router lifetime 0s, reachable time 0s, retrans time 0s
	  source link-address option (1), length 8 (1): 04:99:b9:73:9a:e4
	    0x0000:  0499 b973 9ae4
	  route info option (24), length 16 (2):  fd5e:3f71:7692::/64, pref=medium, lifetime=1800s
	    0x0000:  4000 0000 0708 fd5e 3f71 7692 0000

17:11:45.073837 f4:34:f0:0d:a9:da > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 126: (flowlabel 0x20a00, hlim 255, next-header ICMPv6 (58) payload length: 72) fe80::75:3e:28a4:d2fb > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 72
	hop limit 0, Flags [none], pref medium, router lifetime 0s, reachable time 0s, retrans time 0s
	  source link-address option (1), length 8 (1): f4:34:f0:0d:a9:da
	    0x0000:  f434 f00d a9da
	  prefix info option (3), length 32 (4): fd3f:4d46:8539:4967::/64, Flags [onlink, auto], valid time 1800s, pref. time 1800s
	    0x0000:  40c0 0000 0708 0000 0708 0000 0000 fd3f
	    0x0010:  4d46 8539 4967 0000 0000 0000 0000
	  route info option (24), length 16 (2):  fd5e:3f71:7692::/64, pref=medium, lifetime=1800s

This may be a little off-topic, but for those using Essentials bulbs with Home Assistant I’ve created a workaround for transitioning brightness using a script that can be called from your automations or even other scripts. I hope this helps someone out there.

Hello all thank you for all the hard work getting this up and going. Had a hell of a time getting the bulb to pair with HA after getting it on thread with HomePod mini. When pinging from HA, packet loss was 98%, when pinging from another computer on the network, packet loss was around 10%. Had timeout after timeout when trying to pair in HA. What finally got it paired was actually pinging from HA terminal then immediately pairing. Not sure if the bulb is going into some kind of sleep mode but I have factory reset and retried and this technique is reproducible. For everyone having issues actually pairing this might help out.

It is probably something with my network since I can’t ping6 anything reliably from the rPi host but that’s for another day.

Cheers

My nanoleaf strip drop off the network in the middle of the night. This morning I tried:

  1. Reload the integration - no success
  2. Remove/reapply power to the strip - no success.
  3. Reboot HA - no success
  4. Various combinations of the above - no success
  5. All of the above again and again hoping for a different result - no success

Then I realized the only thing I hadn’t tried … rebooting the homepod mini. SUCCESS. The strip came back up immediately. Because the homepod mini just sits there in the corner, playing music, I completely forgot about it.

Just wanted to post that in case it helps somebody else.

Would a second thread border router have helped here (ie: would the strip have switched to the other BR when the homepod BR functional failed?)

Yes, a thread device will use any/all available border routers to keep connectivity. A thread network may in fact have multiple border routers and the specification allows for devices to pick the best route to reach a destination. In thread 1.3, service discovery and registration is a standard feature of border routers, allowing thread devices to register and discover mdns services. A HomeKit controller/hub keeping track of an accessory must stay subscribed to mdns service discovery updates for _hap._udp services and update accessory addresses as they change. Matter will use another service type, but otherwise work in a similar manner.

Anyone else following along running into a similar thing, I finally got this device working today. I am 95% sure it’s due to having an 802.11ad bonded interface. When I set net.ipv6.conf.br0.accept_ra=2, I also needed to set net.ipv6.conf.bond0.accept_ra=2

1 Like

I’m experiencing similar behavior. Homekit Controller Thread accessories are pairable and work very well, but will sporadically become unavailable and require some combination of Reloading the Integration, restarting Home Assistant, power cycling all homepod mini’s (thread border routers) and as a last resort a factory reset + repairing of the device itself.

I am seriously considering ripping every thread accessory out of Home Assistant and directly pairing with Homekit again until these random disconnects can be sorted out because it’s making these devices useless half the time

1 Like

@setomerza I’m having trouble reproducing this locally because I have a very small number of devices. Do you feel comfortable modifying a few Python files to help debug and to try a few solutions?

Can you please try making the following change:

  1. get to the HA console (depends on how you have it deployed: ssh, serial console, …)
  2. get to the core container: docker exec -it homeassistant /bin/sh
  3. go to the aiohomekit CoAP folder: cd /usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap
  4. open the connection code: vi connection.py
  5. search for the function: /def is_connected
  6. arrow down one line (to the return statement) and press A to append
  7. add a space and paste and self.enc_ctx.send_ctr - self.enc_ctx.recv_ctr < 10
  8. save changes by pressing Esc and then ZZ
  9. restart HA through the web interface
  10. do NOT update HA for a few days as that will wipe out the changes
  11. please report back if it helps or not :slight_smile:

Some background: that shouldn’t let an accessory go more than 10 minutes of no communication as HA will try to poll device state once a minute. We will likely be able to make 10 smaller, I’m just starting with a bit of margin. If it improves things, please feel free to try down to as small as 2 or 3.

I know in Thread v1.3 border routers are supposed to play nice with each other, but how does this work in practice (or does it at this point since Thread v1.3 is relatively new). E.g. Let’s say I have a HomePod, and then add an Echo that supports Thread. Obviously the Echo can used mDNS to discover the HomePod’s Thread network, but how do they “link?” Presumably there authentication keys that need to be exchanged, or some sort of user action that is needed to say “join this existing network?”

I currently have 16 thread devices connected via 3 Homepod Minis to Homekit controller (mix of Nanoleaf Essential A19 bulbs, Eve Energy, Eve Door Sensors, and an Eve Weather). Initially getting everything set up and paired into Home Assistant was a breeze and everything stayed online without issue for a few days, until devices started suddenly falling off the network.

It already seems like some type of device polling/reconnecting is going on now because every minute or so the red icons will turn grey as the device tries to re-initialize and fails.

Please see the attached info below and let me know if you still would like me to try setting up the 10 minute polling

Here are some relevant Logs that might be helpful.

Logger: aiohomekit.controller.coap.connection
Source: /usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py:391 
First occurred: November 5, 2022 at 1:35:03 PM (29063 occurrences) 
Last logged: 1:28:05 PM

Pair verify failed
OSError: [Errno 113] received through errqueue

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py", line 386, in connect
    await self.do_pair_verify(pairing_data)
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py", line 345, in do_pair_verify
    response = await asyncio.wait_for(
  File "/usr/local/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.10/site-packages/aiocoap/protocol.py", line 597, in _run_outer
    await cls._run(app_request, response, weak_observation, protocol, log)
  File "/usr/local/lib/python3.10/site-packages/aiocoap/protocol.py", line 656, in _run
    blockresponse = await blockrequest.response
aiocoap.error.NetworkError: [Errno 113] received through errqueue
Logger: coap-server
Source: runner.py:119 
First occurred: November 5, 2022 at 1:35:01 PM (30522 occurrences) 
Last logged: 1:28:05 PM

Error received and ignored in this codepath: [Errno 113] Host is unreachable
Received Type.ACK from <UDP6EndpointAddress [fd81:d68c:e2c2:0:d9b5:6ee7:9f9b:bca7] (locally fd45:7a0a:5395:42f1:11b5:f444:ec4e:8451%eth0)>, but could not match it to a running exchange.
Received Type.ACK from <UDP6EndpointAddress [fd81:d68c:e2c2:0:247f:a7fd:8eab:4158] (locally fd45:7a0a:5395:42f1:11b5:f444:ec4e:8451%eth0)>, but could not match it to a running exchange.
Received Type.ACK from <UDP6EndpointAddress [fd0c:29f0:2531:0:e499:104d:1488:274a] (locally fd45:7a0a:5395:42f1:11b5:f444:ec4e:8451%eth0)>, but could not match it to a running exchange.
Logger: homeassistant.config_entries
Source: config_entries.py:1088 
First occurred: November 5, 2022 at 1:35:41 PM (2333 occurrences) 
Last logged: 1:27:57 PM

Config entry 'Entrance Closet Bulb' for homekit_controller integration not ready yet: failed to connect; Retrying in background
Config entry 'Crawl Space Front Light' for homekit_controller integration not ready yet: failed to connect; Retrying in background
Config entry 'Office Closet Door' for homekit_controller integration not ready yet: failed to connect; Retrying in background
Config entry 'Alarm Siren Outlet' for homekit_controller integration not ready yet: failed to connect; Retrying in background
Config entry 'Garage Closet Bulb' for homekit_controller integration not ready yet: failed to connect; Retrying in background
Logger: aiohttp.server
Source: /usr/local/lib/python3.10/site-packages/aiohttp/web_protocol.py:405 
First occurred: November 5, 2022 at 2:00:08 PM (3 occurrences) 
Last logged: 1:21:45 PM

Unhandled exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 1191, in _sendfile_fallback
    read = await self.run_in_executor(None, file.readinto, view)
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aiohttp/web_protocol.py", line 514, in start
    resp, reset = await task
  File "/usr/local/lib/python3.10/site-packages/aiohttp/web_protocol.py", line 460, in _handle_request
    reset = await self.finish_response(request, resp, start_time)
  File "/usr/local/lib/python3.10/site-packages/aiohttp/web_protocol.py", line 613, in finish_response
    await prepare_meth(request)
  File "/usr/local/lib/python3.10/site-packages/aiohttp/web_fileresponse.py", line 286, in prepare
    return await self._sendfile(request, fobj, offset, count)
  File "/usr/local/lib/python3.10/site-packages/aiohttp/web_fileresponse.py", line 99, in _sendfile
    await loop.sendfile(transport, fobj, offset, count)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 1170, in sendfile
    return await self._sendfile_fallback(transport, file,
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 1200, in _sendfile_fallback
    await proto.restore()
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 268, in restore
    self._transport.resume_reading()
  File "/usr/local/lib/python3.10/asyncio/sslproto.py", line 343, in resume_reading
    self._ssl_protocol._transport.resume_reading()
AttributeError: 'NoneType' object has no attribute 'resume_reading'

@setomerza

Oof. The first error (Pair verify failed) indicates initial or reconnect communication failure. The second (Received Type.ACK) seems to be harmless noise based on many similar reports. The third I’m guessing is from restarting HA or reloading the integration? Likely caused by the first error. The fourth isn’t related to Thread/CoAP AFAICT.

The issue I’m hoping to fix with the one-line patch above is where communication drops & neither the Border Router nor the device return any error condition. All we have are timeouts. Thinking about this more, you could start with a smaller number than 10, maybe even 1. And yes, please do try the patch.

If there are continued pair verify errors, we’ll dig into that next.

@lambdafunction Okay I appended the patch and turned it down to 1 minute, restarted HA and let everything settle for a while and I’m still seeing the same behavior with the same half of devices unable to connect and the “Pair verify failed” error repeating in logs

Logger: aiohomekit.controller.coap.connection
Source: /usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py:391
First occurred: 2:18:41 PM (83 occurrences)
Last logged: 2:23:23 PM

Pair verify failed
OSError: [Errno 113] received through errqueue

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py", line 386, in connect
    await self.do_pair_verify(pairing_data)
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py", line 345, in do_pair_verify
    response = await asyncio.wait_for(
  File "/usr/local/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.10/site-packages/aiocoap/protocol.py", line 597, in _run_outer
    await cls._run(app_request, response, weak_observation, protocol, log)
  File "/usr/local/lib/python3.10/site-packages/aiocoap/protocol.py", line 656, in _run
    blockresponse = await blockrequest.response
aiocoap.error.NetworkError: [Errno 113] received through errqueue

Interesting… if it is the exact same set of devices each time, that makes me think those devices have some sort of packet loss or other issue. Can you please pick one of the devices that is having trouble and ping6 it while HA is running? Leave the ping running for over a minute to see if there is any packet loss, either regularly or during the HA state poll. You can use mDNS tools to get the hostname or pick a misbehaving NL bulb which should have a pattern like Nanoleaf-A19-XXXX.local. You could also try disabling a few HAP+Thread devices and see if the other ones improve.

This is slightly off-topic, but I’ll answer here.

The process of adding a Thread node, including any router or border router, to a Thread network is called commissioning. In order to commission a node, you need to use a commissioning agent. This is typically going to be a smartphone platform function (e.g. something provided by Android or iOS itself), or potentially a smartphone app (in the case of vendors that do not own a smartphone platform). The bottom line of a commissioning agent is something that knows a set of network (e.g. PAN ID) and cryptographic parameters to allow the node to join a particular Thread network. The node is usually configured through some side-band channel (e.g. Bluetooth, a Wifi network broadcast by the device, a serial port, etc).

The Connectivity Standards Alliance (CSA) members have agreed to build methods to exchange these parameters in order to “merge” their up-to-now disparate Thread networks. For example, Apple added the THCredentials API. Note that this effectively has nothing to do with Thread or Matter per-se.

3 Likes