Thread device going unavailable - How to debug

So I have an Aqara FP300 running Thread firmware and connected via a Sonoff Dongle-M, OTBR and HA Matter Server. This was mainly to test Thread out and the FP300 is sat 40cm away from the Dongle.

The device just keeps dropping offline but I can’t seem to find anything useful to help figure out why. This does appear to be worse when HA is restarted but that might just be my perception

So full disclosure I am running via Docker and using VLANs but I have Matter all setup and working and I have some WiFi dervices that work just fine. I have built my own image from GitHub - ownbee/hass-otbr-docker: Stand-alone Home Assistant OpenThread Border Router docker container. to get the latest commits to the HA add-on. The FP300 also works (and was working quite reliably for a month but now seems to only stay available for a few hours).

Whenever it goes offline, I look in Matter server and see it flagged as offline there
image

but if I got into the OTBR web interface the topology suggests it is there (there isn’t anything to detail what it is actually showing)

If I restart both OTBR and Matter Server is all connects again and starts working until at random it will go offline again

The only thing I see in the OTBR logs around the time it became unavailable is

2026-01-04T13:46:02.691109563Z 00:41:59.562 [I] MeshForwarder-:     src:[fe80:0:0:0:801f:9f96:e5ca:706d]:19788
2026-01-04T13:46:02.691125956Z 00:41:59.562 [I] MeshForwarder-:     dst:[ff02:0:0:0:0:0:0:1]:19788
2026-01-04T13:46:05.278461552Z 00:42:02.150 [I] DataPollHandlr: Rx data poll, src:0x9005, qed_msgs:24, rss:-47, ack-fp:0
2026-01-04T13:46:10.280720707Z 00:42:07.152 [I] DataPollHandlr: Rx data poll, src:0x9005, qed_msgs:24, rss:-47, ack-fp:0
2026-01-04T13:46:11.489449965Z 00:42:08.361 [I] Mle-----------: Send Advertisement (ff02:0:0:0:0:0:0:1)
2026-01-04T13:46:11.572330333Z 00:42:08.444 [I] MeshForwarder-: Sent IPv6 UDP msg, len:90, chksum:5821, ecn:no, to:0xfff
f, sec:no, prio:net, radio:all

and this is there both before and after it

Matter server appears to log that it has lost connectivity

2026-01-04T13:04:53.273464476Z 2026-01-04 13:04:53.272 (MainThread) INFO [matter_server.server.device_controller.mdns] <Node:46> Discovered on mDNS
2026-01-04T13:04:53.273674412Z 2026-01-04 13:04:53.273 (MainThread) INFO [matter_server.server.device_controller] <Node:46> Setting-up node...
2026-01-04T13:04:58.934519542Z 2026-01-04 13:04:58.934 (MainThread) INFO [matter_server.server.device_controller] <Node:46> Setting up attributes and events subscription.
2026-01-04T13:05:04.950986676Z 2026-01-04 13:05:04.948 (MainThread) INFO [matter_server.server.device_controller] <Node:46> Subscription succeeded with report interval [0, 600]
2026-01-04T13:33:34.589368146Z 2026-01-04 13:33:34.588 (Dummy-2) CHIP_ERROR [chip.native.EM] <<5 [E:46639r with Node: <000000000000002E, 1> S:52249 M:171159196] (S) Msg Retransmission to 1:000000000000002E failure (max retries:4)
2026-01-04T13:43:45.524101361Z 2026-01-04 13:43:45.523 (Dummy-2) CHIP_ERROR [chip.native.DMG] Subscription Liveness timeout with SubscriptionID = 0xef60c9a8, Peer = 01:000000000000002E
2026-01-04T13:43:45.537085620Z 2026-01-04 13:43:45.536 (MainThread) INFO [matter_server.server.device_controller] <Node:46> Subscription failed with CHIP Error 0x00000032: Timeout, resubscription attempt 0
2026-01-04T13:44:58.507833354Z 2026-01-04 13:44:58.507 (Dummy-2) CHIP_ERROR [chip.native.EM] <<5 [E:51309i with Node: <0000000000000000, 0> S:0 M:81747620] (U) Msg Retransmission to 0:0000000000000000 failure (max retries:4)
2026-01-04T13:45:16.528924726Z 2026-01-04 13:45:16.528 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from peer <000000000000002E, 1>. Current state was 4
2026-01-04T13:45:16.529387140Z 2026-01-04 13:45:16.529 (Dummy-2) CHIP_ERROR [chip.native.DMG] Failed to establish CASE for re-subscription with error 'src/protocols/secure_channel/CASESession.cpp:594: CHIP Error 0x00000032: Timeout'

followed by several retries and then

2026-01-04T14:14:35.428627406Z 2026-01-04 14:14:35.428 (MainThread) INFO [matter_server.server.device_controller] <Node:46> Subscription failed with CHIP Error 0x00000032: Timeout, resubscription attempt 11
2026-01-04T14:14:35.428884038Z 2026-01-04 14:14:35.428 (MainThread) INFO [matter_server.server.device_controller] <Node:46> Node considered offline, shutdown subscription
2026-01-04T14:14:46.526244880Z 2026-01-04 14:14:46.525 (MainThread) INFO [matter_server.server.device_controller.mdns] <Node:46> Discovered on mDNS
2026-01-04T14:14:46.526380336Z 2026-01-04 14:14:46.526 (MainThread) INFO [matter_server.server.device_controller] <Node:46> Setting-up node...
2026-01-04T14:14:47.662247827Z 2026-01-04 14:14:47.661 (MainThread) INFO [matter_server.server.device_controller.mdns] <Node:46> Discovered on mDNS
2026-01-04T14:16:01.164036452Z 2026-01-04 14:16:01.163 (Dummy-2) CHIP_ERROR [chip.native.EM] <<5 [E:51322i with Node: <0000000000000000, 0> S:0 M:81747633] (U) Msg Retransmission to 0:0000000000000000 failure (max retries:4)
2026-01-04T14:16:17.538407652Z 2026-01-04 14:16:17.537 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from peer <000000000000002E, 1>. Current state was 4
2026-01-04T14:16:20.541382009Z 2026-01-04 14:16:20.540 (MainThread) INFO [matter_server.server.sdk] <Node:46> Attempting to establish CASE session... (attempt 2 of 2)
2026-01-04T14:17:42.217070948Z 2026-01-04 14:17:42.216 (Dummy-2) CHIP_ERROR [chip.native.EM] <<5 [E:51323i with Node: <0000000000000000, 0> S:0 M:81747634] (U) Msg Retransmission to 0:0000000000000000 failure (max retries:4)
2026-01-04T14:17:51.547437522Z 2026-01-04 14:17:51.547 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from peer <000000000000002E, 1>. Current state was 4
2026-01-04T14:17:51.548251919Z 2026-01-04 14:17:51.547 (MainThread) WARNING [matter_server.server.device_controller] <Node:46> Setup for node failed: Unable to establish CASE session with Node 46
2026-01-04T14:17:51.548391995Z 2026-01-04 14:17:51.548 (MainThread) INFO [matter_server.server.device_controller] <Node:46> Retrying node setup in 60 seconds...

This actually seems useful but I can’t find much detail of what it really means or how to fix it (it is a great improvement on other issues where there is simply nothing logged :face_with_symbols_over_mouth:

What is going on here?

1 Like

I’ve been seeing some of the same errors with both my new Yale Matter locks. They will also be having connectivity issues in Google Home during the same time period. I have at least two eeros and a Google Home Max display as border routers.
I had an unlock automation going reliably for maybe a week or two, and now the locks are periodically offline for multiple days

Any solution to this?

I have a heimann fire alarm doing this, and no other devices on the matter network…