Most of the time, I have a stable matter over thread network. But occasionally, boom – a large chunk of devices just goes offline. Sometimes the devices come back over an hour or so. Sometimes matter server churns for more than an hour without really making progress, so I have to start rebooting stuff before I can get everything back online.
What seemed to work this time is turning off the Matter server, and then restarting HA, which also turns the Matter Server back on.
So, I have two problems:
-
Spontaneous moments where a big chunk of devices goes offline for no apparent reason.
-
When a chunk goes offline, they don’t always come back online unless I start rebooting stuff.
Similarly, this behavior causes problems when I need to update my Apple TVs or HA software – both of which cause the devices to go offline.
I’m running HAOS on a Beelink mini PC ethernet and all the latest software. I have two ethernet Apple TVs in the house and 2 HomePod minis in an outbuilding to connect a few matter over thread devices out there.
I have a Unifi network. I’ve done all of the usual stuff – avoided channel 11, and made sure multicast traffic can flow smoothly. I’ve looked at this every which way and I’m not seeing any network errors or problems. And like I said, things can work for days without issue.
My devices are about 72 Inovelli White switches, about 8 Eve smart plugs, two Aqara smart locks, and 20 Sunricher RGB controllers. I suspected the RGB controllers might be the problem, but the network is no more stable with all of them completely off.
I’ve scoured the logs and begged ChatGPT for advice, but so far I have not been able to pinpoint what makes a chunk of devices go offline, and why they don’t come back without rebooting. (ChatGPT is like having a very smart friend who’s also very drunk every time you call him.)
Questions:
How do you recommend troubleshooting this?
Anyone else out there with about 100 MoT devices? Are you seeing the same thing?
Anything happening soon with Apple or Home Assistant that might help with this problem?
Any other advice?
Thanks!