WiFi based devices keep becoming unavailable for random amounts of time and HA network issues

I’m having trouble understanding how to solve this or what exactly the issue is. Not even sure if it’s HA issue or something to do with the computer or router or how to find out. It’s been going on for at least a few weeks, but I can’t remember when it started. Just haven’t had the energy to deal with it.

I’ve noticed that wifi based lights, fans & curtains keep becoming unavailable. They’re a combination of Tuya and Yeelight based. I’ve also gotten more downnotifier emails lately saying that HA is down for random lengths of time.
I’m using supervised home assistant on UTM on a Mac mini. Was working fine and I haven’t made any changes the configuration.

I have NGINX Home Assistant SSL proxy set up with the following in my configuration file:

http:
  use_x_forwarded_for: true     # To ensure HA understands that client requests come via reverse proxy
  trusted_proxies:
    # - 173.239.203.0/24
    - 172.30.32.0/23            # In Hass.io we need to add the Docker subnet
    - 127.0.0.1                 # Add the localhost IPv4 address
    - ::1

I’ve checked logs and under Host there’s the following, I just don’t understand what it means, if it’s saying the issue or how to fix it.

2024-05-20 00:01:01.965 ha systemd[1]: fstrim.service: Deactivated successfully.
2024-05-20 00:01:01.965 ha systemd[1]: Finished Discard unused blocks on filesystems from /etc/fstab.
2024-05-20 00:23:32.529 ha dockerd[458]: time="2024-05-20T00:23:32.528917843Z" level=error msg="[resolver] failed to query DNS server: 172.30.32.3:53, query: ;www.duckdns.org.\tIN\t AAAA" error="read udp 172.30.33.3:59018->172.30.32.3:53: i/o timeout"
2024-05-20 00:23:32.530 ha dockerd[458]: time="2024-05-20T00:23:32.528863177Z" level=error msg="[resolver] failed to query DNS server: 172.30.32.3:53, query: ;www.duckdns.org.\tIN\t A" error="read udp 172.30.33.3:50809->172.30.32.3:53: i/o timeout"
2024-05-20 00:23:34.534 ha dockerd[458]: time="2024-05-20T00:23:34.534089978Z" level=error msg="[resolver] failed to query DNS server: 172.30.32.3:53, query: ;www.duckdns.org.\tIN\t AAAA" error="read udp 172.30.33.3:48140->172.30.32.3:53: i/o timeout"
2024-05-20 00:23:34.535 ha dockerd[458]: time="2024-05-20T00:23:34.533845900Z" level=error msg="[resolver] failed to query DNS server: 172.30.32.3:53, query: ;www.duckdns.org.\tIN\t A" error="read udp 172.30.33.3:36764->172.30.32.3:53: i/o timeout"
2024-05-20 00:23:36.534 ha dockerd[458]: time="2024-05-20T00:23:36.534106901Z" level=error msg="[resolver] failed to query DNS server: 172.30.32.3:53, query: ;www.duckdns.org.\tIN\t A" error="read udp 172.30.33.3:36258->172.30.32.3:53: i/o timeout"
2024-05-20 00:23:36.535 ha dockerd[458]: time="2024-05-20T00:23:36.534432476Z" level=error msg="[resolver] failed to query DNS server: 172.30.32.3:53, query: ;www.duckdns.org.\tIN\t AAAA" error="read udp 172.30.33.3:38218->172.30.32.3:53: i/o timeout"
2024-05-20 00:28:32.788 ha kernel: audit: type=1334 audit(1716164912.784:486): prog-id=120 op=LOAD
2024-05-20 00:28:32.788 ha kernel: audit: type=1334 audit(1716164912.784:487): prog-id=121 op=LOAD
2024-05-20 00:28:32.788 ha kernel: audit: type=1334 audit(1716164912.784:488): prog-id=122 op=LOAD
2024-05-20 00:28:32.802 ha systemd[1]: Starting Hostname Service...
2024-05-20 00:28:32.919 ha systemd[1]: Started Hostname Service.
2024-05-20 00:28:32.924 ha kernel: audit: type=1334 audit(1716164912.920:489): prog-id=123 op=LOAD
2024-05-20 00:28:32.924 ha kernel: audit: type=1334 audit(1716164912.920:490): prog-id=124 op=LOAD
2024-05-20 00:28:32.924 ha kernel: audit: type=1334 audit(1716164912.920:491): prog-id=125 op=LOADis
2024-05-20 00:28:32.946 ha systemd[1]: Starting Time & Date Service...
2024-05-20 00:28:33.023 ha systemd[1]: Started Time & Date Service.

I’d really appreciate help figuring this out. If there’s anything else I need to be looking at to see the issue, or other info needed to solve this, please let me know.

First it would be nice to check if devices are loosing wifi connection or just connection to HA. So, i would find IP’s of all problematic devices first. Then, when it’s unavailable check if you can ping it. If you can, it’s HA problem, if you can’t it’s router’s problem.

I did have similar problems, it turned out to be router’s settings: turn off all “non-standard” things - Asus, for instance, has wifi option “TurboQAM” and “NitroQAM” (among others), and some other asus invented things, which must be disabled. So, check for your router’s model if there’s any info on the internet about settings for “Internet of things” (that’s smart house) devices.

How many wifi devices do you have connected per access point and what sort of wifi access point?

Most all in one home router - wifi boxes can’t handle more than 20-30 devices.

Thanks for the tip. I’ll try check next time they go down.

I have an Asus RT-AC88U. It says 31 clients at the moment but I know I’ve had more at once before and no issues.
Though, now you mention it, I’m trying to remember if this started before or after I got the Aqara doorbell. Tomorrow I’ll try removing it and seeing if that has any affect.

Asus rt-ac88u is not an entry-model so it should easily handle more than 20-30 wifi devices.
Here are two of settings suggestions to check out:
https://www.asus.com/support/faq/1042475/

and from snb forums:

Here are the settings that resolved my issues with iOT devices and the RT-AX86U | SNBForums

I’ve found those settings and only the Target Wake Time was enabled and Wireless mode was on Auto. Changed them, but not seeing a difference.

Host Log is still showing the same as before. And I’ve checked the logbook and I’m seeing a repeating list of WiFi devices/entities becoming unavailable then having their state update like it just changed.

But now I’m looking at that, I’ve realised that the Lifx lights aren’t doing the same thing. They’re available the whole time. So it’s Tuya based devices except the Cat room Curtain, HomePod mini, Yeelight lights and Wake On Lan switches for my Mac mini and MacBook. Deleted the Wake On Lan ones for the mini just incase they were somehow doing something, but no change.

Becoming available again:

History screen with various affected entities:

Totally confused by this but it currently looks like it’s fixed. Yesterday I did firmware upgrades, downgrades, upgrades again, changed random settings. Nothing seemed to do anything. Gave up, watched tv and eventually went to work.
Just checked now to see how bad it currently is and the entities stopped going unavailable at around 16:40. Hopefully it’s fixed permanently and it’s not just a brief respite.