Device going unavailable

I do not know ESPHome, but I have played a bit with ESP32S2 devices and I found a that I often had to increase the timeout on my applications, because the ESP was in a deep sleep and did not wake fully up before the connection was timed out.
If I tried to wake them with a ping command on a windows machine, which sends 4 pings with a 3 seconds timeout, then it was usually only the last ping that was actually getting a reply, so that means more than 9 seconds to wake up from deep sleep.
You might not use deep sleep, but the other modes have wake out times too and might still be an issue.

I’m not using deep sleep anywhere, yet am seeing this problem.
Anyone else?

At one point I had random problems with nodes dropping off line as well as some other devices. Aside from setting static ip’s, I broke them out on a separate vlan.

I know this may not be an option for people but what you usually can do is put them on a separate ssid. Set that ssid to 2.4g only.

If you have access points that overlap a node, only have the ssid broadcast from the AP you want the node to connect to.

In general isp supplied or even gaming routers can only handle ~30 active clients before you start having devices fall off the network.

I’m not suggesting that this is where any of your problems lie, I’m just throwing it out as food for thought. In my experience, as my network overloaded, the esp’s are where where it started to show.

1 Like

Mikefila’s post above prompted me to shift my gaze from ESPHome and onto the WiFi AccessPoint.
I’ve seen ESP nodes “get flaky” out of the blue - where one just starts doing the frequent disconnection syndrome. Restarting the node doesn’t help. Powering it down for at least 8+ hours seems to help, but is not a practical workaround.
But restarting the AP does help.
All nodes are staying connected now. But, as I’ve observed, in a few days one will go flaky again. And restarting the AP will fix it.
I don’t know the innards of 802.11 to even begin to speculate what might be botching up the link relationship for one node so badly that requires restarting the AP.
FWIW, my AP is running latest (2021.02.1) OpenWRT, ath79 platform - it’s a TP-Link ArcherA7v5.
If this continues, I’ll probably flash it back to factory firmware and see how it fares.

24 hours or more, so far, on factory firmware on the WiFi AP (a TP-Link Archer a7v5), and I can report that not a single ESP has mysteriously disconnected. It’s never been that stable.
There’s another case on here where someone had the same AP, also running OpenWrt, same problem.

1 Like

Any update? I find that the connection appears to degrade overtime, where it works great for a few days, and then the disconnects get worse and worse.

I saw the same kind of degradation over time, when the AP was running OpenWRT.
Since changing the AP back to factory firmware, stable as can be. No degradation, no dropouts, no mysterious disconnects.

Interesting, since I have 3 Eero mesh routers. I am installing FW Update and rebooting the network, will see what happens.

I can’t place the blame on OpenWrt for this. It seems to serve all other WiFi clients just fine, but ESPhome it eventually doesn’t handle well.
My guess is that they are both slightly out-of-spec by some small amount, but in opposite directions. Neither one alone is bad enough to interfere when connected to another product, but together they just don’t stay happy.
Although that you are seeing the degredation on WiFi that’s NOT OpenWRT tells me the fault is very likely somewhere in ESPHome (or its Arduino-sourced WiFi library) code.

I’m 90% certain this is a problem with the integration on the Home Assistant side. Because I’ve experienced this a few times in the last month. And every time, restarting the device itself made no difference. However reloading the integration in Home Assistant, magically brings it back to life again.

1 Like

That may be, but there’s one other symptom that isn’t explained by the problem being on the HA side.

When a node is being prone to going unavailable (and in my case it seems to jump from node to node at random, not always the same ones), pinging that node from my desktop shows that about 5-20% of the packets are being dropped. The node is effectively flapping on and off the net. That will of course also affect the quality of its link to HA.
If a node is dropping packets like that, it’s no surprise that HA would have a hard time staying ‘connected’ to it.
And again, simply by replacing the firmware in my AP, all (!) the ESP nodes remain rock-stable on the net, no dropped pings, and no HA disconnects.

Agreed, when the device is unavailable in HA, it is because of network issues. Cant ping, not seen on LAN Scans, etc.

Of course the rest of my network is stable, so very likely the ESPhome/network interactions are flaky.

Whose router firmware are you using?

+1, when it become unavailable in HA I can not ping at all. Recently, this problem has become more frequent. It would be great to check via serial console while it is unavailable.

My current experience:
I’ve a sonoff basic with esphome that randomly become unavailable (often after a power loss).
I’ve 3 OpenWRT routers and I’ve tried pretty all the things others tried in this thread.
The curious thing is that I can see the device in the esphome web interface but it is still unavailable in homeassistant.
What I’ve done today:

  • ping to the device: First 3 ping failed and then the device did respond
  • checking the log through the esphome web interface: I can’t see any error in the log
  • I’ve tried to reload the esphome integration (integrations → esphome → the device → three dots → reload)… and that worked! I can see the blue log about the connection with HA api and the device became available.

Still don’t know why it randomly fails but maybe this can help someone.

1 Like

Here is my chart for one day, every outage lasts 16-17 minutes.

Just to add to this, I’m seeing the same problem with all of my Tuya flashed ESP8266 devices and DIY ESP32 devices. The devices I have running WLED don’t exhibit the same problem.

AP is a UniFi AC Pro running OpenWRT. Some devices use the ESPhome API and some use MQTT, both are going unavailable multiple times an hour.

From ESPhome logs:

[09:19:20][D][sensor:124]: 'Eufy WiFi Signal Sensor': Sending state -53.00000 dBm with 0 decimals of accuracy
INFO 10.15.57.37: Ping timed out!
INFO Disconnected from ESPHome API for 10.15.57.37
WARNING Disconnected from API
INFO Successfully connected to 10.15.57.37
[09:20:02][D][tuya:439]: Sending local time
[09:20:20][D][sensor:124]: 'Eufy WiFi Signal Sensor': Sending state -54.00000 dBm with 0 decimals of accuracy
....................
[09:24:30][D][tuya:439]: Sending local time
[09:24:36][W][mqtt:260]: MQTT Disconnected: TCP disconnected.
[09:24:36][D][mqtt:116]: Resolving MQTT broker IP address...
[09:24:36][D][mqtt:149]: Resolved broker IP address to 10.11.5.10
[09:24:41][I][mqtt:175]: Connecting to MQTT...
[09:24:41][I][mqtt:215]: MQTT Connected!
[09:25:02][D][tuya:439]: Sending local time

From OpenWRT logs:

Fri Feb 18 08:00:56 2022 daemon.notice hostapd: wlan1-1: AP-STA-DISCONNECTED ec:xx:xx:xx:xx:xx
Fri Feb 18 08:00:56 2022 daemon.info hostapd: wlan1-1: STA ec:xx:xx:xx:xx:xx IEEE 802.11: disassociated
Fri Feb 18 08:00:56 2022 daemon.info hostapd: wlan1-1: STA ec:xx:xx:xx:xx:xx IEEE 802.11: disassociated
Fri Feb 18 08:00:56 2022 daemon.info hostapd: wlan1-1: STA ec:xx:xx:xx:xx:xx IEEE 802.11: disassociated
Fri Feb 18 08:00:57 2022 daemon.info hostapd: wlan1-1: STA ec:xx:xx:xx:xx:xx IEEE 802.11: disassociated
Fri Feb 18 08:00:57 2022 daemon.info hostapd: wlan1-1: STA ec:xx:xx:xx:xx:xx IEEE 802.11: disassociated
Fri Feb 18 08:00:57 2022 daemon.info hostapd: wlan1-1: STA ec:xx:xx:xx:xx:xx IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Fri Feb 18 08:01:04 2022 daemon.info hostapd: wlan1-1: STA ec:xx:xx:xx:xx:xx IEEE 802.11: authenticated
Fri Feb 18 08:01:04 2022 daemon.info hostapd: wlan1-1: STA ec:xx:xx:xx:xx:xx IEEE 802.11: associated (aid 13)

Looks like the devices goes unavailable more frequently than I get disconnects in OpenWRT.

I may be way off base here, but nowhere did I see anyone address the RSSI of the devices that drop offline with no obvious reason.

I have a Sonoff Mini in a wall switch box that was frequently “unavailable”, but when it was, the RSSI was -78 dB or worse. I put a wireless access point in the vicinity of the device and when I changed the WiFi credentials in the device (using EspHome) to the WAP, the RSSI went to -56.

It has been rock steady for the past week.

Also, there is little discussion of mixing devices of drastically different WiFi capabilities. WiFi can be finicky and mixing devices of low WiFi capabilities (ESP8266, ESP32, other IoT devices) with those of high (cell phones, tablets, laptops, etc.) creates many complications.

When reliability is of high priority, connecting devices with disparate WiFi capability on the same access point is generally not recommended and also relying on WiFi for near 100% reliability when using consumer-grade AP’s is not realistic.

Low RSSI, relying on MDNS, a network full of discovery packets (broadcasts), other devices using the same frequency (neighbor’s access points, Bluetooth, Zigbee), too many WiFi devices and low-grade AP’s can lead to many WiFi related headaches.

Since upgrading to 2022.2.3, my nodes show red/offline more than online. My conclusion is that my nodes are not offline, but ESPHome running on Docker is ‘confused’. Even when a node shows offline, I can click on logs and connect.

My test node is an actual Wemos D1 mini that is steps from a ceiling AP. It has a static IP, so mDNS is not necessary. Signal strength does not seem to be an issue. The RSI shows about -45 dB. My Omada controller that manages my access points shows it as -37dBm and connected continuously for more than 5 days.