Matter Server Add-on stops working at same 4 times a day

  • I currently have 57 matter devices connected to my Home Assistant Green most of them via SONOFF ZBDongle-E flashed with the most recent Thread firmware, only very few Wifi-devices
  • Most of the Thread devices are Nanoleaf Essentials downlights flashed with the latest firmware 4.1.3 - they have not been paired via the Nanoleaf app but directly from within Home Assistant
  • I also have a second SONOFF Zigbee Gateway, ZBDongle-E flashed with the latest Zigbee firmware running Zigbee2MQTT with more than 100 devices connected on a different channel

The Matter devices become unavailable at exactly the same 4 times each day (central european summer time):

  • 05:35
  • 11:35
  • 17:35
  • 23:35

Most times the devices became avaiIable again after some minutes to hours, so that I haven’t noticed until digging deeper. Sometimes however the devices didn’t come back until I restarted the Matter Server Add-On manually. My family and I have lived with this situation for almost a year and we have been fed up.

So a few days ago I’ve created an automation that counts the unavailable devices and restarts the Matter Add-On above a threshold of 5 devices being offline for more than 5 minutes.

Because of this I noticed that this happens exactly at the same times each day, because now I have the helper variable history with the counts.

Plus the problem is drastically less dramatic now: The devices are only unavailable for 6 to 30 minutes (it takes a while until all are back following a restart of the add-on) and I know when. Additionally my light automations put the light in the correct state when becoming available.

My Zigbee network on the same hardware is absolutely stable. After a restart, devices become available within seconds. There is never any issue with device availability - the covered area is exactly the same. I’ve even moved all my Aqara switches and bulbs from Thread to Zigbee and they’ve never had any issues since.

I recommend everyone to stay away from Matter on Home Assistant for now.

With the automation described above my hope is coming back that this can be solved.

Any ideas?

This has been an issue since the beginning of using Home Assistant almost a year ago - through all versions. Everything is updated and the Sonoff sticks have the latest firmware.

Here’s a graph of the helper I’ve created counting unavailable Matter devices over time (it’s maxing out at 57 which is the amount of Matter devices that are in my network and this includes devices from Nanoleaf and Eve on Thread, plus Tapo on Wifi:

Here’s an example of the error messages I’m seeing in the logs of the Matter Add-On when the devices are unavailable:

2025-10-02 11:37:00.533 (Dummy-2) CHIP_ERROR [chip.native.EM] <<5 [E:5919i with Node: <000000000000005A, 1> S:21294 M:179480598] (S) Msg Retransmission to 1:000000000000005A failure (max retries:4)
2025-10-02 11:37:03.581 (Dummy-2) CHIP_ERROR [chip.native.DMG] Subscription Liveness timeout with SubscriptionID = 0x84c91fb8, Peer = 01:000000000000002C
2025-10-02 11:37:03.588 (MainThread) INFO [matter_server.server.device_controller] <Node:44> Subscription failed with CHIP Error 0x00000032: Timeout, resubscription attempt 0
2025-10-02 11:37:03.833 (Dummy-2) CHIP_ERROR [chip.native.EM] <<5 [E:5923i with Node: <0000000000000000, 0> S:0 M:238866805] (U) Msg Retransmission to 0:0000000000000000 failure (max retries:4)

I’m not sure the Matter addon is the issue.

I have several Matter over wifi devices that are solid and I’ve never noticed them dropping out of Home Assistant.

I’d be curious to see if there are any other interesting items in your logs around these times. For example, I have the studio core server addon running that I restart daily because its got a terrible memory leak in it.

Also check your house for noisy devices that run at those hours.
If neighbors are close by, then they might be the source too.

Moving devices from Thread or WiFi to Zigbee will probably change the channel being used, which is why you are seeing an improvement on Zigbee.

1 Like

May want to check OTBR logs at around those times to see if its getting a bunch of log entries like Dropping rx frag frame, error:Drop

The standard telco test for FF interference is a battery AM radio tuned between stations. Walk about and listen to the static change to rhythmic interference.

My telco used to issue radios to field tech to find stuff as obscure as faulty streetlights and xmas decorations so noisy, they hit nearby WIRED xDSL connections!

No, that’s not it. Changed channels back and forth. Zigbee stays stable, Thread is unstable on any channel.

Definitely have those in my logs. Will check live during one of these times again. Unfortunately OTBR doesn’t show timestamps in the logs.

I have only very few Add-Ons running. Studio Core Server is actually one of them, I’ve stopped it now - I indeed only need it when doing changes to files, so I will start, use and stop. Thanks for the tip. I had these problems already before installing this Add-On. So I guess that still won’t help.

Same feedback from me, stay away from Matter.

I do not have any issue with zigbee (even having 2 separates network: hue bridge and Z2M dongle ) with + 100 devices and have one TBR for matter with 10 devices.
If I’m adding the matter device nanoleaf element into the picture, than Matter on HA is completely broken.

For your issue, check also CPU usage at that moment of time. Try to see what is happening around via history / logbook(activity now) tab

I resolved the issue somehow. First I unplugged the Sonoff stick for an hour or so and after plugging it back in the issue remained, same 4 times each day. Then I unplugged the Sonoff stick for a night. Result was, that the 6 hour rhythm changed starting 6 hours after plugging it in again. I repeated this once more and the 6 hour rhythm switched again. Then I’ve cut power to all Thread devices for a night. Didn’t work. Then I’ve switched the channel to 26 (from 15 which is otherwise not used, my Zigbee is on 20) and I removed power from all devices again and slowly turned them on room by room via their respective breakers. I’m pretty certain I had it on 26 before, but maybe I hadn’t - I definitely had switched to different channels a few times already. However; now it’s been working fine without any interruption for more than 24 hours. So my best advice to anyone having problems with Matter via Thread is to move to channel 26.

Check my understanding (has to do with Thread Dataset migration):

  • Before changing channels, all the devices were powered down
  • You then changed channels (from the Thread Integration) to channel 26
  • Then powered up the Thread devices and they successfully rejoined the Thread network

Is the Thread integration still showing on Channel 26?

It has been over a year since that article was published. We have since gained the ability to “Share” a device to another TBR, but afaik we’re still nowhere near the suggested “unity mesh under a single network” solution.

Any news on this? I’m most curious, even though I only have a single OTBR right now, I’d like to add others down the line (maybe on different floors of the house or one outside or something).

I am not sure what you describe is even possible.

You are talking about a broken up mesh and not a single united mesh.
The extra TBRs are for backups of the main TBR, not for running a mesh in different places.

Oh they are? We’re already at that point? So I don’t have to share my OTBR nanoleaf lightbulb with say, Alexa to make it exposable to it? Gone are the days of each TBR having it’s own thread network? Been living under a rock if that’s the case.

Well you can have plenty of seperate Matter meshes, but having a single Matter mesh network where there are not mesh connection between some parts of it is not a valid setup, AFAIK.

That’s exactly the point of the article as I understand it. To the best of my knowledge, currently every router creates its own mesh network. You can “share” a device so it can be used by multiple routers, but it’s still not a unified network, but multiple networks sharing a device.

What the article is describing is a scenario where multiple routers would join the same, single network under the “main” router and act as an actual part of the mesh.

We’re not at that point yet, are we?

We are some of the way there.
Some TBRs can not work with other TBRs, but some can.

This is exactly what a classic memory leak looks like.

Watch the RAM usage in the add-on, and I’ll bet you see it go up over 6 hours.
Turn off the watchdog and if I am right, the add-on will stop in about six hours and not restart.

I’m seeing this too. All my matter devices connected to my OTBR go offline every 6 hours too. The system is an HA Yellow.

This is exactly what a classic memory leak looks like.

And it’s way more precise than a memory leak. It’s goes down precisely every 6 hours.

Though sometimes recovery onto the network takes extra time for some devices.

I checked and I don’t even have the watchdog turned on and the memory is barely in use. With CPU addon usage at 1.3% and Memory at 0.1%

My system has about 90 nodes and the behavior is specific to the 75 or so nodes that are on the HA Yellow Open Border Router.

I have ~15 nodes on a HomePod in a detached structure which is outside of radio range and they don’t go offline with this cadence. And as I’ve read about different thread routers colliding I tried unplugging the homepod for a night taking it and all the devices connected to it offline and there were still outages on schedule while OTBR was the only active thread network.

In the thread border router logs I have found that the event starts with two failures to send then thousands of lines of dropped packets

21d.23:04:52.980 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:244, chksum:b3b8, ecn:no, to:0x4000, sec:yes, error:NoAck, prio:low, radio:15.4
21d.23:04:52.980 [N] MeshForwarder-:     src:[xxxx:xxxx:xxxx:x:xxxx:xxxx:xxxx:xxxx]:33004
21d.23:04:52.980 [N] MeshForwarder-:     dst:[yyyy:yyyy:yyyy:y:yyyy:yyyy:yyyy:yyyy]:5540
21d.23:04:53.172 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:90, chksum:c351, ecn:no, to:0xd800, sec:yes, error:NoAck, prio:low, radio:15.4
21d.23:04:52.980 [N] MeshForwarder-:     src:[xxxx:xxxx:xxxx:x:xxxx:xxxx:xxxx:xxxx]:33004
21d.23:04:52.980 [N] MeshForwarder-:     dst:[zzzz:zzzz:zzzz:z:zzzz:zzzz:zzzz:zzzz]:5540
21d.23:04:53.194 [N] MeshForwarder-: Dropping rx frag frame, error:Drop, len:88, src:0xd90b, dst:0x8c00, sec:yes, tag:63326, offset:1080, dglen:1245
21d.23:04:53.258 [N] MeshForwarder-: Dropping rx frag frame, error:Drop, len:77, src:0xd90b, dst:0x8c00, sec:yes, tag:63326, offset:1168, dglen:1245
21d.23:04:53.465 [N] MeshForwarder-: Dropping rx frag frame, error:Drop, len:88, src:0xd90b, dst:0x8c00, sec:yes, tag:63327, offset:112, dglen:1249
21d.23:04:53.472 [N] MeshForwarder-: Dropping rx frag frame, error:Drop, len:88, src:0xd90b, dst:0x8c00, sec:yes, tag:63327, offset:200, dglen:1249
21d.23:04:53.517 [N] MeshForwarder-: Dropping rx frag frame, error:Drop, len:88, src:0xe841, dst:0x8c00, sec:yes, tag:42307, offset:112, dglen:1245
21d.23:04:53.593 [N] MeshForwarder-: Dropping rx frag frame, error:Drop, len:88, src:0xe841, dst:0x8c00, sec:yes, tag:42307, offset:200, dglen:1245

It goes on for a few minutes and then just stops

21d.23:06:41.815 [N] MeshForwarder-: Dropping rx frag frame, error:Drop, len:88, src:0x2842, dst:0x8c00, sec:yes, tag:17996, offset:288, dglen:685
21d.23:06:41.979 [N] MeshForwarder-: Dropping rx frag frame, error:Drop, len:88, src:0x2842, dst:0x8c00, sec:yes, tag:17996, offset:464, dglen:685
21d.23:06:41.988 [N] MeshForwarder-: Dropping rx frag frame, error:Drop, len:21, src:0x2843, dst:0x8c00, sec:yes, tag:51903, offset:112, dglen:133
21d.23:06:42.009 [N] MeshForwarder-: Dropping rx frag frame, error:Drop, len:88, src:0x2842, dst:0x8c00, sec:yes, tag:17996, offset:552, dglen:685

There are a few other errors mixed in. It appears that approximately once a second there’s this one

21d.23:06:40.720 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1251, chksum:2724, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-85.5, radio:15.4

There are more Failed to Send errors interspersed the high rates of dropping

And I found
21d.23:06:22.907 [W] P-RadioSpinel-: Handle transmit done failed: ChannelAccessFailure near some of the repeated failures to send.

And occasionally
21d.23:05:44.092 [N] MeshForwarder-: Dropping IPv6 UDP msg, len:282, chksum:a0db, ecn:no, sec:yes, error:Drop, prio:low, radio:all

There was another thread here: Channel Access Failure - #23 by agners which suggests that the channel access failure implies radio interference.

I"ve included the first and last errors and it takes about 2 minutes of many errors and then drops back to nominal operations.

With my large network of integrated switches I’m very hesitant to jump channels if it might not transition everything. And that doesn’t get to anything that makes sense as to an underlying issue that was resolved.

Do you have your radio stick as an USB device and is it directly connected to the Yellow?
Try to add a powered USB hub in between the Yellow and the USB radio stick.