Thread stability issues running OTBR inside Proxmox? Try disabling multicast snooping on the bridge interface

I’ve had a few Matter over Thread devices for a while, and recently I significantly expanded my network thanks to the new IKEA devices. I’ve been using the new OHF Matter(.js) Server available as an option in the current Matter Server App, along with the OpenThread Border Router App. Things were very good (the new Matter server is looking great!), but I was still running into frustrating issues where Thread devices would just go offline for a while with no seeming rhyme or reason.

I tried moving devices around, re-commissioning problematic devices, reviewing my UniFi configuration to ensure that things were configured the right way, and none of it seemed to actually do any good.

What seems to have done the trick for me was looking a bit more into how running HA inside of Proxmox VE impacts things. By default, VMs are connected to my physical network via a Linux bridge, essentially a virtual switch, usually vmbr0. Just like any of your physical network infrastructure, there are options and features here that can impact how network services traverse the interface.

The tweak I made was to disable IGMP/MLD snooping on the bridge interface. You can test this out from the Proxmox host shell:

echo 0 > /sys/class/net/vmbr0/bridge/multicast_snooping

You can make this setting persist by editing /etc/network/interfaces on the host as well:

auto vmbr0
iface vmbr0 inet static
    address 192.168.1.10/24
    gateway 192.168.1.1
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0
    bridge-mcsnoop no  <-- Add this line

Your mileage may vary but for me this was the difference between a mostly-working (which means completely untrustworthy) Thread network to a functional, responsive, and most importantly reliable Thread network.

I’ll be curious to hear if this helps anyone else. I waited a few days before posting to see if any ghosts or gremlins popped up but so far I’ve had nothing but smooth sailing since making the change.

1 Like

I don’t use Proxmox (I use QEMU/KVM/Libvirt) but I find it interesting that IGMP Snooping is enabled by default on a kernel bridge. In fact it never occurred to me that this function was even available in a kernel bridge, so I checked my own systems, and sure enough it is there and they are enabled by default. I’ve never had an issue with layer2 multicast so I’m wondering if there is something else making things work. I don’t have any IGMP Querier going on which would mean dynamically created multicast entries should timeout and block multicast. What I found was that there are several permanent entries in the kernel bridge for “well-known” multicasts (both IPv4 and IPv6) which explains why things work as well as they do.

I haven’t really had much of an issue with Thread device(s) going off line, but I will disable igmp/mld snooping anyway… Thanks.