NetworkManager gets connected to WiFi but no internet

I suspect there’s something weird about the network setup that supervised HASS produces, I hope someone will be able to chime in and point me to right direction. I’m using supervised hass on top of RaspberryPi OS (buster).

My problem is that from time to time my RPi 4 loses WiFi connection and after reconnecting to it, it doesn’t get the internet connection back (even though I can ssh to it over WiFi). At least not until a reboot.

It’s unclear to me why it disconnects from wifi in the first place but anyways I have a cron job running every 5 minutes that restarts the NetworkManager service when connection is lost. Having restarted, it successfully reconnnects to wifi but the device won’t connect to the internet. I see nothing suspicious in syslog that could trigger the disconnect nor any errors while connecting.

The only thing I managed to diagnose is that the route is changed, and my device probably defaults to one of hass virtual networks it created, rather than using wlan0.

Before reboot

$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface         U     207    0        0 veth47c24e9         UG    **600**    0        0 wlan0     U     207    0        0 veth47c24e9     U     209    0        0 veth66c4199     U     211    0        0 vethae8c724     U     213    0        0 veth1ee51c7     U     215    0        0 vetha757eae     U     217    0        0 veth126ec60     U     219    0        0 veth2bf023a     U     221    0        0 veth0c3cdc3     U     **0**      0        0 docker0   U     **0**      0        0 hassio   U     600    0        0 wlan0

After reboot

$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface         UG    **0**      0        0 wlan0         U     207    0        0 veth5e60ad6         UG    600    0        0 wlan0     U     207    0        0 veth5e60ad6     U     209    0        0 veth1c8a93b     U     211    0        0 veth8a99575     U     213    0        0 veth483f273     U     215    0        0 vetha5e5c7a     U     217    0        0 vethdecaa67     U     219    0        0 vethb122bf8     U     221    0        0 veth61d0906     U     0      0        0 docker0   U     0      0        0 hassio   U     600    0        0 wlan0

I wonder if anyone could give me a hint how to debug/fix that?

As far as I can comprehend wifi - Howto migrate from networking to systemd-networkd with dynamic failover - Raspberry Pi Stack Exchange - it seems that Home Assistant creates second default gateway, and when wlan0 fails, the system will switch to the other one created in docker. As it now has lower metric, it will be used by default.
So the question is, should the network created by Home Assistant be a default gateway? If so, how to ensure it never takes precedence over wlan?

Looking at the router logs I discovered sometimes a radar is found and that causes a channel change. I’ve changed it to a fixed one so that should reduce a number of wifi failures, but it’d be still nice to have a mechanism to restore networking whenever it fails. (I wonder why network manager doesn’t handle channel change - is it rpi specific?)

Ok, nevermind, I’ve added a few cron jobs and it seems running this line after wifi gets reconnected “solves” it:

/usr/sbin/route add -net default gw netmask dev wlan0 metric 0

It would be nice though if home assistant setup didn’t require it

The bug is still there.
Another workaround is:

sudo nmcli connection modify "Supervisor wlan0" ipv4.route-metric 1
sudo nmcli connection up