HomeKit Accessory Protocol (HAP) over CoAP/UDP (was: Nanoleaf Essentials bulb via Thread/CoAP)

Hello all thank you for all the hard work getting this up and going. Had a hell of a time getting the bulb to pair with HA after getting it on thread with HomePod mini. When pinging from HA, packet loss was 98%, when pinging from another computer on the network, packet loss was around 10%. Had timeout after timeout when trying to pair in HA. What finally got it paired was actually pinging from HA terminal then immediately pairing. Not sure if the bulb is going into some kind of sleep mode but I have factory reset and retried and this technique is reproducible. For everyone having issues actually pairing this might help out.

It is probably something with my network since I can’t ping6 anything reliably from the rPi host but that’s for another day.

Cheers

My nanoleaf strip drop off the network in the middle of the night. This morning I tried:

  1. Reload the integration - no success
  2. Remove/reapply power to the strip - no success.
  3. Reboot HA - no success
  4. Various combinations of the above - no success
  5. All of the above again and again hoping for a different result - no success

Then I realized the only thing I hadn’t tried … rebooting the homepod mini. SUCCESS. The strip came back up immediately. Because the homepod mini just sits there in the corner, playing music, I completely forgot about it.

Just wanted to post that in case it helps somebody else.

Would a second thread border router have helped here (ie: would the strip have switched to the other BR when the homepod BR functional failed?)

Yes, a thread device will use any/all available border routers to keep connectivity. A thread network may in fact have multiple border routers and the specification allows for devices to pick the best route to reach a destination. In thread 1.3, service discovery and registration is a standard feature of border routers, allowing thread devices to register and discover mdns services. A HomeKit controller/hub keeping track of an accessory must stay subscribed to mdns service discovery updates for _hap._udp services and update accessory addresses as they change. Matter will use another service type, but otherwise work in a similar manner.

Anyone else following along running into a similar thing, I finally got this device working today. I am 95% sure it’s due to having an 802.11ad bonded interface. When I set net.ipv6.conf.br0.accept_ra=2, I also needed to set net.ipv6.conf.bond0.accept_ra=2

1 Like

I’m experiencing similar behavior. Homekit Controller Thread accessories are pairable and work very well, but will sporadically become unavailable and require some combination of Reloading the Integration, restarting Home Assistant, power cycling all homepod mini’s (thread border routers) and as a last resort a factory reset + repairing of the device itself.

I am seriously considering ripping every thread accessory out of Home Assistant and directly pairing with Homekit again until these random disconnects can be sorted out because it’s making these devices useless half the time

1 Like

@setomerza I’m having trouble reproducing this locally because I have a very small number of devices. Do you feel comfortable modifying a few Python files to help debug and to try a few solutions?

Can you please try making the following change:

  1. get to the HA console (depends on how you have it deployed: ssh, serial console, …)
  2. get to the core container: docker exec -it homeassistant /bin/sh
  3. go to the aiohomekit CoAP folder: cd /usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap
  4. open the connection code: vi connection.py
  5. search for the function: /def is_connected
  6. arrow down one line (to the return statement) and press A to append
  7. add a space and paste and self.enc_ctx.send_ctr - self.enc_ctx.recv_ctr < 10
  8. save changes by pressing Esc and then ZZ
  9. restart HA through the web interface
  10. do NOT update HA for a few days as that will wipe out the changes
  11. please report back if it helps or not :slight_smile:

Some background: that shouldn’t let an accessory go more than 10 minutes of no communication as HA will try to poll device state once a minute. We will likely be able to make 10 smaller, I’m just starting with a bit of margin. If it improves things, please feel free to try down to as small as 2 or 3.

I know in Thread v1.3 border routers are supposed to play nice with each other, but how does this work in practice (or does it at this point since Thread v1.3 is relatively new). E.g. Let’s say I have a HomePod, and then add an Echo that supports Thread. Obviously the Echo can used mDNS to discover the HomePod’s Thread network, but how do they “link?” Presumably there authentication keys that need to be exchanged, or some sort of user action that is needed to say “join this existing network?”

I currently have 16 thread devices connected via 3 Homepod Minis to Homekit controller (mix of Nanoleaf Essential A19 bulbs, Eve Energy, Eve Door Sensors, and an Eve Weather). Initially getting everything set up and paired into Home Assistant was a breeze and everything stayed online without issue for a few days, until devices started suddenly falling off the network.

It already seems like some type of device polling/reconnecting is going on now because every minute or so the red icons will turn grey as the device tries to re-initialize and fails.

Please see the attached info below and let me know if you still would like me to try setting up the 10 minute polling

Here are some relevant Logs that might be helpful.

Logger: aiohomekit.controller.coap.connection
Source: /usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py:391 
First occurred: November 5, 2022 at 1:35:03 PM (29063 occurrences) 
Last logged: 1:28:05 PM

Pair verify failed
OSError: [Errno 113] received through errqueue

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py", line 386, in connect
    await self.do_pair_verify(pairing_data)
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py", line 345, in do_pair_verify
    response = await asyncio.wait_for(
  File "/usr/local/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.10/site-packages/aiocoap/protocol.py", line 597, in _run_outer
    await cls._run(app_request, response, weak_observation, protocol, log)
  File "/usr/local/lib/python3.10/site-packages/aiocoap/protocol.py", line 656, in _run
    blockresponse = await blockrequest.response
aiocoap.error.NetworkError: [Errno 113] received through errqueue
Logger: coap-server
Source: runner.py:119 
First occurred: November 5, 2022 at 1:35:01 PM (30522 occurrences) 
Last logged: 1:28:05 PM

Error received and ignored in this codepath: [Errno 113] Host is unreachable
Received Type.ACK from <UDP6EndpointAddress [fd81:d68c:e2c2:0:d9b5:6ee7:9f9b:bca7] (locally fd45:7a0a:5395:42f1:11b5:f444:ec4e:8451%eth0)>, but could not match it to a running exchange.
Received Type.ACK from <UDP6EndpointAddress [fd81:d68c:e2c2:0:247f:a7fd:8eab:4158] (locally fd45:7a0a:5395:42f1:11b5:f444:ec4e:8451%eth0)>, but could not match it to a running exchange.
Received Type.ACK from <UDP6EndpointAddress [fd0c:29f0:2531:0:e499:104d:1488:274a] (locally fd45:7a0a:5395:42f1:11b5:f444:ec4e:8451%eth0)>, but could not match it to a running exchange.
Logger: homeassistant.config_entries
Source: config_entries.py:1088 
First occurred: November 5, 2022 at 1:35:41 PM (2333 occurrences) 
Last logged: 1:27:57 PM

Config entry 'Entrance Closet Bulb' for homekit_controller integration not ready yet: failed to connect; Retrying in background
Config entry 'Crawl Space Front Light' for homekit_controller integration not ready yet: failed to connect; Retrying in background
Config entry 'Office Closet Door' for homekit_controller integration not ready yet: failed to connect; Retrying in background
Config entry 'Alarm Siren Outlet' for homekit_controller integration not ready yet: failed to connect; Retrying in background
Config entry 'Garage Closet Bulb' for homekit_controller integration not ready yet: failed to connect; Retrying in background
Logger: aiohttp.server
Source: /usr/local/lib/python3.10/site-packages/aiohttp/web_protocol.py:405 
First occurred: November 5, 2022 at 2:00:08 PM (3 occurrences) 
Last logged: 1:21:45 PM

Unhandled exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 1191, in _sendfile_fallback
    read = await self.run_in_executor(None, file.readinto, view)
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aiohttp/web_protocol.py", line 514, in start
    resp, reset = await task
  File "/usr/local/lib/python3.10/site-packages/aiohttp/web_protocol.py", line 460, in _handle_request
    reset = await self.finish_response(request, resp, start_time)
  File "/usr/local/lib/python3.10/site-packages/aiohttp/web_protocol.py", line 613, in finish_response
    await prepare_meth(request)
  File "/usr/local/lib/python3.10/site-packages/aiohttp/web_fileresponse.py", line 286, in prepare
    return await self._sendfile(request, fobj, offset, count)
  File "/usr/local/lib/python3.10/site-packages/aiohttp/web_fileresponse.py", line 99, in _sendfile
    await loop.sendfile(transport, fobj, offset, count)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 1170, in sendfile
    return await self._sendfile_fallback(transport, file,
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 1200, in _sendfile_fallback
    await proto.restore()
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 268, in restore
    self._transport.resume_reading()
  File "/usr/local/lib/python3.10/asyncio/sslproto.py", line 343, in resume_reading
    self._ssl_protocol._transport.resume_reading()
AttributeError: 'NoneType' object has no attribute 'resume_reading'

@setomerza

Oof. The first error (Pair verify failed) indicates initial or reconnect communication failure. The second (Received Type.ACK) seems to be harmless noise based on many similar reports. The third I’m guessing is from restarting HA or reloading the integration? Likely caused by the first error. The fourth isn’t related to Thread/CoAP AFAICT.

The issue I’m hoping to fix with the one-line patch above is where communication drops & neither the Border Router nor the device return any error condition. All we have are timeouts. Thinking about this more, you could start with a smaller number than 10, maybe even 1. And yes, please do try the patch.

If there are continued pair verify errors, we’ll dig into that next.

@lambdafunction Okay I appended the patch and turned it down to 1 minute, restarted HA and let everything settle for a while and I’m still seeing the same behavior with the same half of devices unable to connect and the “Pair verify failed” error repeating in logs

Logger: aiohomekit.controller.coap.connection
Source: /usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py:391
First occurred: 2:18:41 PM (83 occurrences)
Last logged: 2:23:23 PM

Pair verify failed
OSError: [Errno 113] received through errqueue

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py", line 386, in connect
    await self.do_pair_verify(pairing_data)
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py", line 345, in do_pair_verify
    response = await asyncio.wait_for(
  File "/usr/local/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.10/site-packages/aiocoap/protocol.py", line 597, in _run_outer
    await cls._run(app_request, response, weak_observation, protocol, log)
  File "/usr/local/lib/python3.10/site-packages/aiocoap/protocol.py", line 656, in _run
    blockresponse = await blockrequest.response
aiocoap.error.NetworkError: [Errno 113] received through errqueue

Interesting… if it is the exact same set of devices each time, that makes me think those devices have some sort of packet loss or other issue. Can you please pick one of the devices that is having trouble and ping6 it while HA is running? Leave the ping running for over a minute to see if there is any packet loss, either regularly or during the HA state poll. You can use mDNS tools to get the hostname or pick a misbehaving NL bulb which should have a pattern like Nanoleaf-A19-XXXX.local. You could also try disabling a few HAP+Thread devices and see if the other ones improve.

This is slightly off-topic, but I’ll answer here.

The process of adding a Thread node, including any router or border router, to a Thread network is called commissioning. In order to commission a node, you need to use a commissioning agent. This is typically going to be a smartphone platform function (e.g. something provided by Android or iOS itself), or potentially a smartphone app (in the case of vendors that do not own a smartphone platform). The bottom line of a commissioning agent is something that knows a set of network (e.g. PAN ID) and cryptographic parameters to allow the node to join a particular Thread network. The node is usually configured through some side-band channel (e.g. Bluetooth, a Wifi network broadcast by the device, a serial port, etc).

The Connectivity Standards Alliance (CSA) members have agreed to build methods to exchange these parameters in order to “merge” their up-to-now disparate Thread networks. For example, Apple added the THCredentials API. Note that this effectively has nothing to do with Thread or Matter per-se.

3 Likes

I have 2 Nanoleaf Essentials A19 bulbs, the first 1 paired using thread smoothly, I spent hours trying to pair the 2nd one but I always got the below error. I finally unscrewed the 1st (paired) bulb restarted HA and paired the 2nd one and it worked. Screwed back the 1st one and the same error is back, tried cycling power for the bulbs, homepods, HA, nothing works, consistently the same error, one of the bulbs works (whichever connects first) and the 2nd throws the below error:

Pair verify failed
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py", line 386, in connect
    await self.do_pair_verify(pairing_data)
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py", line 334, in do_pair_verify
    coap_client = await Context.create_server_context(root, bind=("::", 0))
  File "/usr/local/lib/python3.10/site-packages/aiocoap/protocol.py", line 287, in create_server_context
    await self._append_tokenmanaged_transport(
  File "/usr/local/lib/python3.10/site-packages/aiocoap/protocol.py", line 156, in _append_tokenmanaged_transport
    transport = await token_interface_constructor(tman)
  File "/usr/local/lib/python3.10/site-packages/aiocoap/transports/ws.py", line 191, in create_transport
    server = await websockets.serve(
  File "/usr/local/lib/python3.10/site-packages/websockets/legacy/server.py", line 1117, in __await_impl__
    server = await self._create_server()
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 1505, in create_server
    raise OSError(err.errno, 'error while attempting '
OSError: [Errno 98] error while attempting to bind on address ('::', 3000, 0, 0): address in use

Any idea how can I fix this?

@06_ripcord.ogress I think that error was fixed in a recent HA release, 2022.9.X or 2022.10.X (aiocoap bug). What version are you on? If you are on a recent version, we’ll need to figure out what version of the aiocoap library you are using.

@lambdafunction Thanks for responding, I’m on the latest version 2022.11.x

@06_ripcord.ogress can you please sanity check a few things for me:

grep ^Version /usr/local/lib/python3.10/site-packages/aiocoap-*/METADATA

grep -n -C 3 3000 /usr/local/lib/python3.10/site-packages/aiocoap/transports/ws.py

Are you using HA OS?

I didn’t implement your fix but just wanted to add that I experience the same problem. I’ve added 6 Thread Eve Thermos (which worked without any problems) and they all work for a few days. Then suddenly some or all of them aren’t available anymore in homeassistant. It’s not always the same device. Sometimes rebooting Home Assistant or the Home Pod mini/ Apple TV fixes it for a few hours/ some thermostat or even a few days but that’s not reliable. I didn’t get all of them to reconnect today - as of now only one device is reachable.

I tried to ping the devices and they are pingable (at least 5/6 are - although I can only control 1/6 at the moment)

Logger: aiohomekit.controller.coap.connection
Source: /usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py:391 
First occurred: 18:03:44 (90 occurrences) 
Last logged: 18:14:46

Pair verify failed
OSError: [Errno 113] received through errqueue

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py", line 386, in connect
    await self.do_pair_verify(pairing_data)
  File "/usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py", line 345, in do_pair_verify
    response = await asyncio.wait_for(
  File "/usr/local/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.10/site-packages/aiocoap/protocol.py", line 597, in _run_outer
    await cls._run(app_request, response, weak_observation, protocol, log)
  File "/usr/local/lib/python3.10/site-packages/aiocoap/protocol.py", line 656, in _run
    blockresponse = await blockrequest.response
aiocoap.error.NetworkError: [Errno 113] received through errqueue

@DaJonas @setomerza can both of you try a quick change, let’s extend the amount of time pair verify has to finish. Please edit /usr/local/lib/python3.10/site-packages/aiohomekit/controller/coap/connection.py around line 347. You can search for do_pair_verify and then down ~19 lines should be the wait_for call with timeout=8.0. Can you please change that timeout to 45.0?

Background: aiocoap retries unanswered packets automatically with an exponential backoff (1s, 2s, 4s, 8s, …). Perhaps we aren’t giving busy/lossy networks enough time…

@lambdafunction

@06_ripcord.ogress that’s really weird… you should be on aiocoap 0.4.4. Let me look around, maybe some other component is pulling in 0.4.1.

EDIT: if you want to fix this now, change the else: line on 187 of ws.py to elif port != 0: being careful to preserve the spacing and restart HA.

@setomerza @DaJonas it is possible you are having the same issue if you are on 2022.11.x. Can you please check your aiocoap version? And please make this change. It is a fix I pushed into aiocoap months ago to solve this problem. I’m not sure why it has downgraded.