Thread device going unavailable - How to debug

So I have an Aqara FP300 running Thread firmware and connected via a Sonoff Dongle-M, OTBR and HA Matter Server. This was mainly to test Thread out and the FP300 is sat 40cm away from the Dongle.

The device just keeps dropping offline but I can’t seem to find anything useful to help figure out why. This does appear to be worse when HA is restarted but that might just be my perception

So full disclosure I am running via Docker and using VLANs but I have Matter all setup and working and I have some WiFi dervices that work just fine. I have built my own image from GitHub - ownbee/hass-otbr-docker: Stand-alone Home Assistant OpenThread Border Router docker container. to get the latest commits to the HA add-on. The FP300 also works (and was working quite reliably for a month but now seems to only stay available for a few hours).

Whenever it goes offline, I look in Matter server and see it flagged as offline there
image

but if I got into the OTBR web interface the topology suggests it is there (there isn’t anything to detail what it is actually showing)

If I restart both OTBR and Matter Server is all connects again and starts working until at random it will go offline again

The only thing I see in the OTBR logs around the time it became unavailable is

2026-01-04T13:46:02.691109563Z 00:41:59.562 [I] MeshForwarder-:     src:[fe80:0:0:0:801f:9f96:e5ca:706d]:19788
2026-01-04T13:46:02.691125956Z 00:41:59.562 [I] MeshForwarder-:     dst:[ff02:0:0:0:0:0:0:1]:19788
2026-01-04T13:46:05.278461552Z 00:42:02.150 [I] DataPollHandlr: Rx data poll, src:0x9005, qed_msgs:24, rss:-47, ack-fp:0
2026-01-04T13:46:10.280720707Z 00:42:07.152 [I] DataPollHandlr: Rx data poll, src:0x9005, qed_msgs:24, rss:-47, ack-fp:0
2026-01-04T13:46:11.489449965Z 00:42:08.361 [I] Mle-----------: Send Advertisement (ff02:0:0:0:0:0:0:1)
2026-01-04T13:46:11.572330333Z 00:42:08.444 [I] MeshForwarder-: Sent IPv6 UDP msg, len:90, chksum:5821, ecn:no, to:0xfff
f, sec:no, prio:net, radio:all

and this is there both before and after it

Matter server appears to log that it has lost connectivity

2026-01-04T13:04:53.273464476Z 2026-01-04 13:04:53.272 (MainThread) INFO [matter_server.server.device_controller.mdns] <Node:46> Discovered on mDNS
2026-01-04T13:04:53.273674412Z 2026-01-04 13:04:53.273 (MainThread) INFO [matter_server.server.device_controller] <Node:46> Setting-up node...
2026-01-04T13:04:58.934519542Z 2026-01-04 13:04:58.934 (MainThread) INFO [matter_server.server.device_controller] <Node:46> Setting up attributes and events subscription.
2026-01-04T13:05:04.950986676Z 2026-01-04 13:05:04.948 (MainThread) INFO [matter_server.server.device_controller] <Node:46> Subscription succeeded with report interval [0, 600]
2026-01-04T13:33:34.589368146Z 2026-01-04 13:33:34.588 (Dummy-2) CHIP_ERROR [chip.native.EM] <<5 [E:46639r with Node: <000000000000002E, 1> S:52249 M:171159196] (S) Msg Retransmission to 1:000000000000002E failure (max retries:4)
2026-01-04T13:43:45.524101361Z 2026-01-04 13:43:45.523 (Dummy-2) CHIP_ERROR [chip.native.DMG] Subscription Liveness timeout with SubscriptionID = 0xef60c9a8, Peer = 01:000000000000002E
2026-01-04T13:43:45.537085620Z 2026-01-04 13:43:45.536 (MainThread) INFO [matter_server.server.device_controller] <Node:46> Subscription failed with CHIP Error 0x00000032: Timeout, resubscription attempt 0
2026-01-04T13:44:58.507833354Z 2026-01-04 13:44:58.507 (Dummy-2) CHIP_ERROR [chip.native.EM] <<5 [E:51309i with Node: <0000000000000000, 0> S:0 M:81747620] (U) Msg Retransmission to 0:0000000000000000 failure (max retries:4)
2026-01-04T13:45:16.528924726Z 2026-01-04 13:45:16.528 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from peer <000000000000002E, 1>. Current state was 4
2026-01-04T13:45:16.529387140Z 2026-01-04 13:45:16.529 (Dummy-2) CHIP_ERROR [chip.native.DMG] Failed to establish CASE for re-subscription with error 'src/protocols/secure_channel/CASESession.cpp:594: CHIP Error 0x00000032: Timeout'

followed by several retries and then

2026-01-04T14:14:35.428627406Z 2026-01-04 14:14:35.428 (MainThread) INFO [matter_server.server.device_controller] <Node:46> Subscription failed with CHIP Error 0x00000032: Timeout, resubscription attempt 11
2026-01-04T14:14:35.428884038Z 2026-01-04 14:14:35.428 (MainThread) INFO [matter_server.server.device_controller] <Node:46> Node considered offline, shutdown subscription
2026-01-04T14:14:46.526244880Z 2026-01-04 14:14:46.525 (MainThread) INFO [matter_server.server.device_controller.mdns] <Node:46> Discovered on mDNS
2026-01-04T14:14:46.526380336Z 2026-01-04 14:14:46.526 (MainThread) INFO [matter_server.server.device_controller] <Node:46> Setting-up node...
2026-01-04T14:14:47.662247827Z 2026-01-04 14:14:47.661 (MainThread) INFO [matter_server.server.device_controller.mdns] <Node:46> Discovered on mDNS
2026-01-04T14:16:01.164036452Z 2026-01-04 14:16:01.163 (Dummy-2) CHIP_ERROR [chip.native.EM] <<5 [E:51322i with Node: <0000000000000000, 0> S:0 M:81747633] (U) Msg Retransmission to 0:0000000000000000 failure (max retries:4)
2026-01-04T14:16:17.538407652Z 2026-01-04 14:16:17.537 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from peer <000000000000002E, 1>. Current state was 4
2026-01-04T14:16:20.541382009Z 2026-01-04 14:16:20.540 (MainThread) INFO [matter_server.server.sdk] <Node:46> Attempting to establish CASE session... (attempt 2 of 2)
2026-01-04T14:17:42.217070948Z 2026-01-04 14:17:42.216 (Dummy-2) CHIP_ERROR [chip.native.EM] <<5 [E:51323i with Node: <0000000000000000, 0> S:0 M:81747634] (U) Msg Retransmission to 0:0000000000000000 failure (max retries:4)
2026-01-04T14:17:51.547437522Z 2026-01-04 14:17:51.547 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from peer <000000000000002E, 1>. Current state was 4
2026-01-04T14:17:51.548251919Z 2026-01-04 14:17:51.547 (MainThread) WARNING [matter_server.server.device_controller] <Node:46> Setup for node failed: Unable to establish CASE session with Node 46
2026-01-04T14:17:51.548391995Z 2026-01-04 14:17:51.548 (MainThread) INFO [matter_server.server.device_controller] <Node:46> Retrying node setup in 60 seconds...

This actually seems useful but I can’t find much detail of what it really means or how to fix it (it is a great improvement on other issues where there is simply nothing logged :face_with_symbols_over_mouth:

What is going on here?