HA can't reach devices on a different subnet from itself - no firewall rules in between subnets

someinternetguy · January 24, 2021, 11:51pm

TL/DR; HA cannot reach a different (routed) subnet to communicate with IoT devices… While other hosts on the same subnet can - even one with the same IP. What gives?

Hi all… I have a curious networking issue, and I have not been able to find anything in the forums or googling around about it. I am running a fresh build of HAOS on fully updated Proxmox. Home Assistant 10.7.7.7) is unable to access any devices on my IoT subnet (they are both /24 networks - with NO blocking firewall rules in between, right now). I can’t even ping the IoT gateway at 10.7.2.1 using the SSH addon for HA. I don’t understand this. Right now for troubleshooting, I have a completely open network with routing allowed on any port between any subnet. Everything else is able to route and communicate between subnets, except HA itself! For further proof that this is an HA issue, I spun up a Ubuntu VM on the same ProxMox host and gave it the same IP as my HAOS VM (10.7.7.7 - after shutting HAOS down of course), and this test VM has no issue pinging and getting responses from the IoT subnet. I’m using Unifi gear, btw. I have verified that HAOS networking settings show the correct gateway, etc. I have also restarted many times, even restarting the proxmox host.

To layout the issue simply…

HAOS @ 10.7.7.7 on proxmox

ping 10.7.7.1 = OK (Local gateway)
ping 10.7.2.1 = FAIL (IoT Gateway)
ping 10.7.2.118 = FAIL (ESPHome Node)

Ubuntu Test VM @ 10.10.7.7 on same proxmox host

ping 10.7.7.1 = OK (Local gateway)
ping 10.7.2.1 = OK (IoT Gateway)
ping 10.7.2.118 = OK (ESPHome Node)

I don’t see any reason for this to happen unless there is some wonky networking in the docker area of HAOS… honestly, I feel like its something simple I’m missing, but it’s something I can’t figure out and I don’t even know what else to do to troubleshoot… any pushes in the right direction would be appreciated.

PS: this all use to be working fine when I was running on an unsupported version of Ubuntu running HASSIO on top. Then back in December one of the Hass updates messed it up and the problem I describe above started happening. I could not figure it out and knew I was running unsupported, so I did what any good lad would do and spun up a fresh new HAOS supported version on the same ProxMox host where the old one was running… killed the old version and gave the new one the old ones config and IP. But even this fresh install has this routing issue. I have not been able to figure it out (even after rebuilding my Unifi network, just in case that was the issue)… SO here I am, ion the forums begging for help… please help

wmaker · January 25, 2021, 5:10pm

Not sure how helpful I can be, but it sorta seems like hassos doesn’t know what its default route/gateway is. I checked my system using $ip route and its showing up empty even though I can ping across subnets. I did configure via the Supervisor a static config for my gateway, so hassos seems to use that setting somewhere/how. Another way to check is at the hassos linux console enter $nmcli con show. You should see a profile Supervisor enp0s3 or something like it. Then enter $nmcli con show 'Supervisor emp0s3' and see what the entry ipv4.gateway shows (it should have an ip address of your router).

Vlad · January 25, 2021, 7:59pm

You may add static route on the routing table for each subnet 1.x/24 and 2.x/24.
or You can enable routing on the router and just run a basic routing protocol like RIP to enable talk between the 2 subnets.

someinternetguy · January 26, 2021, 3:16pm

Thank you @wmaker for the reply! I ran your suggested commands on HassOS over SSH and got the following; can you (or someone) please help me decipher and suggest next steps…?

➜  ~ ip route
default via 10.7.7.1 dev enp0s18  metric 100
10.0.0.0/8 via 172.22.22.3 dev ztmjfabzvk
10.7.7.0/24 dev enp0s18 scope link  src 10.7.7.7  metric 100
172.17.0.0/16 dev docker0 scope link  src 172.17.0.1
172.22.22.0/24 dev ztmjfabzvk scope link  src 172.22.22.143
172.30.32.0/23 dev hassio scope link  src 172.30.32.1

it appears there is a default route correctly pointing to my router @ 10.7.7.1
then a route to all of the 10.0.0.0/8 via some internal docker IP (172.22.22.3) and device called ztmjfabzvk.
- I’m not sure what to make of that, but it seems as though that might be what is NOT allowing traffic destined for my other 10.x subnets to reach my router at 10.7.7.1 for proper routing…?? any ideas here?

➜  ~ nmcli con show
NAME                UUID                                  TYPE      DEVICE
Supervisor enp0s18  a25fd7a0-77a1-4378-ae1f-050bcdbae407  ethernet  enp0s18
HassOS default      f62bf7c2-e565-49ff-bbfc-a4cf791e6add  ethernet  --

➜  ~ nmcli con show 'Supervisor emp0s18'
Error: Supervisor emp0s18 - no such connection profile.

again, not sure what to make of this last command failing to show a connection profile… any further help is appreciated…

someinternetguy · January 26, 2021, 3:21pm

Thanks for the reply @Vlad… are you talking about adding routes in HassOS? or at my router? My router already has the routes between subnets and is working as designed… so if we’re talking about adding them to HassOS, that makes sense I suppose, but if you’ll see my reply to @wmaker above, there is a default route in the table that to me looks like its messing things up, but maybe its there for some HASSy reason that I don’t understand, lol?!

Do you have commands needed or do you know of some documentation on the subject? thank you.

wmaker · January 26, 2021, 5:50pm

Glad to see your ip route command works for hassos…mine doesn’t seem to work for some odd reason. But yes to most of your questions/statements. If you try to ping 10.7.2.1, longest prefix match on route lookups will hit the route entry
10.0.0.0/8 via 172.22.22.3 dev ztmjfabzvk. So your packet will be sent to device ztmjfabzvk for processing. This device has an address of 172.22.22.143 and 172.x addresses are usually used by docker containers. Are you using a add-on that does some kind of network processing?

BTW, regarding your nmcli cons show there is a typo …enp instead of emp…

someinternetguy · January 26, 2021, 9:58pm

Aha, a typo, thank you for catching that., when I run it now I get the correct gateway:

➜  ~ nmcli con show 'Supervisor enp0s18'
...
ipv4.gateway:                           10.7.7.1

… back a few minutes later…OK, I figured it out… well, you figured it out! On your advice, I took a look at what addons are routing network traffic, and the only one I had running was Zerotier! …as soon as I stopped that, it deleted the bad routing entry, and boom, HA can now reach my other subnets! I believe this was because I had a managed route for 10.0.0.0/8 defined in Zerotier central, that was then being carried down by the plugin.

THANK YOU @wmaker, and hopefully, this little exercise will help others who may run into this or a similar issue…

wmaker · January 26, 2021, 10:48pm

Out of curiosity, do you like ZeroTier services?

someinternetguy · January 28, 2021, 3:33pm

In fact, I do. Aside from this little issue (which was of my own doing). It works very well on Linux and windows and is super easy to set up and manage, and the best part is there are no port forwards needed in your firewall. I was using it as a way to stay connected to home on my lappy and iphone when out of the house, but I dont get out much any more so really don’t use it. I was thinking about trying WireGuard since it is not dependent on any cloud… but I really don’t want to put holes in the firewall… so i’m not really sure which to use.

T81 · May 17, 2022, 8:01am

I have the same issue with pinging other subnets. Running haos on a vm. Latest version as of writing I do not have any network manipulating addons installed. I started investigating when I couldn’t discover services across different subnets. Anyone having the same issue?