Multiple entities have flapping unavailable state

Hey there,

I can’t help myself anymore and I BET the first answer somebody throws in will be “It’s your Wi-Fi!”. But listen first :slight_smile:

I’m using HA since a few months now and I freaking love it, but one issue drives me F’ING CRAZY!!!
But here’s my setup first:

  • HA running using Hassio, using a Raspberry Pi 4 Model B, 4GB Ram (everything always kept up2date)
    ** The RPi is connected via LAN cable to a TP-Link DLAN adapter (see below). But I also tried connecting it directly to the FritzBox (via LAN) for around a week, which had no effect on the issue
  • SD Card: SanDisk Extreme microSDXC 64GB, Class A2
  • Router: Fritz!Box 7530
    ** The Fritz!Box is located pretty much in the middle of my 85qm² apartment
  • Two TP-Link D-LAN Adapters, newest Gen, 1300MBit (effectively doing only 300-600, but whatever…).

Wi-Fi setup:
After having the issues I will report below since ever using HA, I thought: “Maybe IT IS your Wi-Fi…”. So I switched away from this:

  • Fritz!Box provides a 2.4GHz and a 5GHz Wi-Fi (with different SSIDs, but they join the same network, so devices can see each other)
  • Both TP-Link D-LANs act as a Mesh-Repeaters for both 2.4 and 5GHz Wi-Fi, but send the data over D-LAN to an adapter directly connected to the Fritz!Boxs LAN Port. So the standard setup for D-LAN range extending.

To this:

  • Fritz!Box still provides a 2.4GHz and a 5GHz Wi-Fi,
  • But the D-LAN adapters only repeat the 5GHz Wi-Fi.
    So all 2.4 GHz devices (so 90% of all smart home devices) connect directly to the Fritz!Box. With that, I was hoping to eliminate the below, but it is still persistent. This is also a reason why I think it is not my Wi-Fi, as it happened when they had a super strong / excellent signal (because of the D-LAN repeaters) and now with a still great to excellent signal.

My actual issue:
Since I use HA, I have devices / entities that keep changing to “unavailable”, just to come back after 25-30 seconds. I see this across (not all, but) many Wi-Fi devices, 2.4 stronger affected than 5GHz. But overall, they all have this at least once per day or every few days.
Devices that are affected super strong by this are my Yeelight Bedside Lamps (obviously located in the bedroom). They change to unavailable 50-70 times a day(!!) / around 5-7 times per hour, just to come back 25-30 seconds after. I also have a couple of Yeelight Smart Color Bulbs and Ceiling lights. Also they have that issue, but way less frequent. Even the Ceiling lamp in the same room as the Bedside lamps have this waaay less often.
The next more frequent device having this issue is my Chromecast Ultra (located in the living room, conencted to the 5GHz Wi-Fi to a DLAN repeater right next to it). It does that also a couple of times per hour. Actually it got worse in the last week: Before, I didn’t even realize it doing this. Now I see it happening more often, as I have an automation that reacts on the Chromecast changing inputs or changing to “playing” or “not playing”. And I see this automation now triggering 2-3 times per hour, as it changes color of a few lights…
Another device that does that pretty often is an ESP running WLED (located in the office). Also becomes unavailable very often just to come back ~30 seconds after.

So what I actually see in the log book is this:


But for a lot of entities / devices. If I look into the actual log file, I see this for the bedside lamps most of the time:

2020-09-16 20:43:29 ERROR (SyncWorker_49) [homeassistant.components.yeelight] Unable to update device 192.168.178.34, [Bedroom] Bedside Lamp Andy: Bulb closed the connection.
2020-09-16 20:43:29 ERROR (SyncWorker_49) [homeassistant.components.yeelight] Unable to update device 192.168.178.35, [Bedroom] Bedside Lamp Kristina: Bulb closed the connection.
2020-09-16 20:58:54 ERROR (SyncWorker_50) [homeassistant.components.yeelight] Unable to update device 192.168.178.34, [Bedroom] Bedside Lamp Andy: Bulb closed the connection.
2020-09-16 20:58:54 ERROR (SyncWorker_50) [homeassistant.components.yeelight] Unable to update device 192.168.178.35, [Bedroom] Bedside Lamp Kristina: Bulb closed the connection.

But sometimes also this:

2020-09-16 20:37:41 ERROR (SyncWorker_16) [homeassistant.components.yeelight] Unable to update device 192.168.178.40, [Living Room] Big Lamp: A socket error occurred when sending the command.
2020-09-16 20:37:41 ERROR (SyncWorker_16) [homeassistant.components.yeelight] Unable to update device 192.168.178.44, [Living Room] Small Lamp: A socket error occurred when sending the command.
2020-09-16 20:37:54 ERROR (SyncWorker_16) [homeassistant.components.yeelight] Unable to update device 192.168.178.34, [Bedroom] Bedside Lamp Andy: A socket error occurred when sending the command.
2020-09-16 20:37:54 ERROR (SyncWorker_16) [homeassistant.components.yeelight] Unable to update device 192.168.178.35, [Bedroom] Bedside Lamp Kristina: A socket error occurred when sending the command.

That’s the WLED error:

2020-09-16 20:37:56 ERROR (MainThread) [homeassistant.components.wled] Error fetching wled data: Invalid response from API: Timeout occurred while connecting to WLED device.

I also see other kinds of connection issues for various devices (but maybe like ~5 times a day), like

  • the tado bridge (for my room thermostats),
  • the Reolink E1 Zoom camera
  • My Shelly 2.5s (controlling my roller blinds) and a Shelly 1
  • I even see things like this for the NabuCasa cloud I’m testing since this month:
2020-09-16 02:51:12 ERROR (MainThread) [snitun.multiplexer.core] Ping fails, no response from peer
2020-09-16 02:51:17 ERROR (MainThread) [snitun.client.client_peer] Can't connect to SniTun server eu-central-1.ui.nabu.casa:443
2020-09-16 02:51:17 ERROR (MainThread) [hass_nabucasa.remote] Connection problem to snitun server

or:

2020-09-16 03:11:10 ERROR (MainThread) [snitun.multiplexer.core] Ping fails, no response from peer

While I agree this absolutely smells like my Wi-Fi is crap, I have to say: I have zero issues with my network at home, neither does my Chromecast, nor any lamp I’m using (when I was using it via the Yeelight app), nor any PC conencted to Wi-Fi or via LAN to one of the D-LAN adapters. I even work from home since April, constantly having a VPN connection open that tunnels my whole PC traffic, including a Twitch stream sometimes and I have ZERO issues.
I also tried pinging both bedside lamp simultaneously from my PC and from the console within HA: 2000 packets sent, ZERO packet loss. But HA kept saying the bulb is unavailable and comes back…
It’s only HA having issue reaching all of these devices quite often and I’m really out of ideas.

Do you have any suggestions where I could still try to find what the issue is?

  • If you really still think it’s the Wi-Fi: How would I reliably test / verify it and find the cause?
  • Is the RPi 4B with 4GB RAM to underpowered maybe?

Please help me. I am so overwhelmed with HA and full of joy. But on the other hand, this issue makes me soooooo frustrated and more often now interferes with my automations logic or things just happen randomly, as an automation gets triggered because a Shelly or the Chromecast comes back…

Thanks and greetings,

Andy!

Ehm, just 2 cents here… is the network connection from home assistant to you lan/wlan reliable?
I presume you do not ping to the wifi devices from homeassistant but another pc…

1 Like

As written above, I tried pinging the Bedside lamps from a PC and the terminal inside HA: zero packet loss.
And as said: The RPi was also connected to the router directly via a LAN cable, standing right next to: Same issues.

Ok. Give me back my 2 cents then :blush:

Naaaah, spent them already at the next best poker table, sorry!!!

One other thing… during the problems (unavailability) of devices have you also ping devices simultaneously?

Since you are talking bout 5Ghz it “might” have to do with DFS…

Once I had also MAJOR issues due to that…

I manually change my 5Ghz wifi to only use non-dfs channels and since rock solid.

Don’t know where you live but might try
Look in the 5gh table for the “dfs” line and do not use these channels…

Good read, will check that later / tomorrow, as already midnight here :wink:

But: This would only affect the 5GHz devices, right? It would not have an effect on the 2.4GHz devices, which are the ones that have the most issues for me

I found reliability increased when I created a separate 2.4GHz-only guest network and moved all the annoying devices over to it.

But how can your HA then see these devices? Is it also in that guest network?

As written above, my 2.4GHz is seperate in a way, that it has its own SSID and channel (obviously), but the Fritz!Box makes it so that all devices (2.4GHz, 5GHz, LAN) can see each other

No - my router has an option to isolate the networks, but I don’t use that. Therefore each device on the main wifi can see each device on the guest network, and vice versa. The two wireless networks have separate SSIDs. And in my case, HA is on the wired network. And they’re all using the same DHCP, so there’s actually really only one network (ie. one address range). You could argue it’s a guest network in name only. But I’m no network engineer, so I’ve likely got some of the terminology wrong.

Then I have exactly the same setup :slight_smile:

I chased these types of issues for months. My house is 10,000 sq ft including the basement, patios and garages. 7 access points with the same SSID were placed evenly around the house to attempt to get good signal everywhere. In my case, it was because my WiFi devices would randomly decide to switch to an access point with a much lower signal. All my ESP8266 based devices would do this constantly. I ended up moving to an Eero mesh and have had almost no issues since. I still had issues with one esp8266 in my basement going unavailable several times a day. It was reporting a signal of -77dBm which is not good. I still need 1 more Eero to cover my basement devices. For now, I installed one of my TP-Link Omada Access Points down there with a different SSID and it’s been rock solid ever since. Not a single unavailable reported. The other issue turned out to be a setting on the nic driver on my wife’s laptop. Channel width was hard set to 20MHz instead of auto. After changing to auto, all problems disappeared.

I’d recommend using a wifi analyzer app on a phone or tablet and go to the areas where your problematic devices are located and see if you have a low signal issue (under -70 dBm). Also check to see if there’s another access point broadcasting on the same frequency that could be drowning out your signal. Watch it for several minutes.

I had major dropped packets with my laptop and esp8266 devices located near my master bedroom. It would cycle every minute. I could see this hidden ssid starting out with a low signal then increasing slowly until it was so strong it was killing my other devices nearby. I tracked it back to the Roku in my master bedroom. It was originally attached to an SSID that I had retired not too long prior but it was using ethernet, not wifi. Apparently, even if it’s not using wifi, it still tries to phone home every minute and increases its signal until giving up and trying again. The solution was to attached to my new ssid then switch back to ethernet. Problem solved.

Probably not your issues since not 2.4GHz but some Rokus also use 5GHz WiFi Direct to talk with the remote controls. If they are on the same channel as your WiFi network, they will increase there increase the WiFi signal to drown out your WiFi network. Here’s more info on how to fix that: How to prevent Roku Wifi Direct from breaking 5ghz devices

1 Like

Thanks for the very very detailed answer, lots of info there I have to test and work through. Currently on vacatiom but will test whole next week and see if I can find something.

However: The issue with devices jumping between APs was also one of my first ideas that it could be an/the issue. But it’s this “wonderful mesh network”, where this shouldn’t be an issue… Still, as written above: The 2.4GHz Wi-Fi is only broadcasted by the router/Fritz!Box, so there is no AP jumping. So that can’t be it.

Signal should also be strong and good, but will see / test.

Channels: Fritz!Box shows you the usage of all channels and I already switched to a not very common / used channel in my house. But I can try that again, it’s worth a try.

Thanks again!!

On another note: Dude, wtf: 10,000 square feet!? That’s nearly 1,000qm² O_o

4,700 sq ft main house. 3,300 sq ft basement. 1,000 sq ft two attached garages. 1,000 sq ft three covered patios. Trying to get good wifi coverage everywhere was a pain. I was very surprised as to how much better the signal strength was with just three Eero devices.

I’ve worked in houses and toured houses at home shows in my area that make mine look like one of those “Tiny Houses” you see on TV shows.

pretty impressive.

One more question: On what hardware is your HA running? RPi, or something else?

I used to run it on an older raspberry pi model 2b over WiFi but it wasn’t powerful enough to handle some ffmpeg processing I do for my security camera system. I switched to a fanless motherboard. ASRock J3355M Intel Dual-Core Processor J3355. It’s not super powerful but considerably more cpu power and memory than the raspberry pi model 2b. I have some significantly more powerful motherboards/cpus but decided to use this because it’s silent and uses very little energy.

Ok, but it at least sounds like that the RPi Model 4, 4GB should be enough to handle all of what HA does. I was a bit worried that it’s maybe the RPi reaching it’s limits causing my issues

Do you have a solution for your issue? I’m facing the same issue with nearly all of my Shelly devices. Same FritzBox and using the FritzMesh…

Using unicast as described at the integration page solves my issues.