ESPHome devices lose their connection

sorry I can’t help with this: my knowledge stops before this :slight_smile:

No problem!! Thanks for replying to my message anyway. Going to try MQTT now as the issue persists even on a fresh install!

I would suggest a test, if you can:

  • Turn off all your devices connected to Wifi/Lan
  • Keep turned on only Home Assistant box (Rpi4) and 1 esphome device
  • Turn on step by step the other devices and see which one is causing issues

for the sake of information do you have a mesh setup? do you have multiple APs?

Just tried MQTT with no success.

Alright will try that. So strange as it was working fine before!

Single AP - Archer A7 with latest OpenWRT, not a huge flat, about 100sqm all over one floor.

It’s quite strange as after a restarting different bits I’ll end up with a different pair of devices going offline all the time. It’s always 2/3 that go unavailable at the same time, but always a different set!

I’ll let you know.

Okay, so I’m getting closer to figuring it out.

I think it’s the AP. With one client connected, it would be fine, but after about an hour of two or more clients being connected, I’d get the same issue. Didn’t matter which one it was.

So I set up my RPi 3B+ as an access point nearer to some of the switches. Overnight, only those connected to the OpenWRT router had issues. I read somewhere that it could be WMM mode, so I’ve turned that off and left it to see if it returns to its previous behaviours.

I’ll probably order a small AP to go where the RPi is now until I make the switch to Zigbee at some point in the future!

what is WMM mode?

From the OpenWRT Wiki: ‘Enables WMM (802.11e) support. Required for 802.11n support’… Whatever that means.

Okay, that made no difference, same thing started happening after a while. Something else is causing this but I have absolutely no idea what. Very close to completely reinstalling everything and going from there, this is incredibly infuriating.

Hi Dan,

I think re-installing will make no difference
Regarding this:

I think it’s the AP. With one client connected, it would be fine, but after about an hour of two or more clients being connected, I’d get the same issue. Didn’t matter which one it was.

can you please explain exactly what was your setup including wired devices?

Don’t reinstall everything, it won’t help.
It’s a known issue, there’s not a whole lot you can do about it.

I use the restore_from_flash option to mitigate the effects of this bug.

Hey both, sorry for the late response.

I haven’t reinstalled everything. Thanks for letting me know that it won’t help.

So for a few days, everything was rock solid. Haven’t changed anything and one of the switches has started with this again.

@SimonPth, thanks, I read over that before but didn’t think it applied since the devices are not losing WiFi connection. Only the connection to Home Assistant. This is verified by the fact I can stay connected to the Logger over WiFi, watch hostapd on my OpenWRT router and not see any WiFi disconnects. I’m also doing all the things that it tells me to in that post. The issue is also occurring far more frequently than the auto reboot time.

I’ll just give it time and see. So annoying though! Very close to just going all in on Zigbee ¯\_(ツ)_/¯

Oh, I don’t think it means losing wifi connection, I think it’s when it loses connection to the Home Assistant API. (At least that’s the way I interpreted it)

Mine still does it intermittently, but doesn’t really affect operation, so I’ve learnt to live with it until they fix it.

Ahh okay, it was the last bullet point that must’ve made me interpret it that way. For what it’s worth it’s all stable for the last hour since I posted, before this one node was reconnecting every 2 minutes or so!

what did you do to make it more stable? (if confirmed to be more stable :slight_smile: )

My setup is windows based and I have found the asyncio
https://github.com/esphome/aioesphomeapi/blob/892ee4d80650a2e822e30577336fa9812cd39f8a/aioesphomeapi/connection.py line 50, await asyncio.sleep(self._params.keepalive) settings have impact on the timeouts . Doing experiments with different time settings of the self._params.keepalive parameter by adjusting it to 300 ms is optimal, I found this affected the timeouts (higher or lower values worses timeout), I had repeatedly timeouts regarly every 2 to 4 hours before but after experimenting with this parameter the timeouts almost disapeared. This parameter has impact on asyncio parallell operation. What do you think?

Interesting!
It would be interesting to raise it to a ESPHOME developer: did you open a feature request/github issue about it?

I too have been suffering from this and so far no permanent fix.

I have found that reflashing my ESP32 (which seems to suffer most), for example, with the same ESPHome compile does solve the problem for a considerable time. Stopping and restarting does not, however. All will be fine until I have a network outage. Then I’m likely to get the problem again, though not guaranteed.

I use Home-Dashboard as a simple HA user interface (which, IMHO is excellent) and I’ve noticed that it too seems to periodically have the same problem connecting to the HA API, that I assume it uses.

I’m just throwing this out there as a possibility that the issue isn’t the ESPHome devices but the HA API.

I don’t have the skills to investigate this or suggest solutions but thought I’d add my experience into the mix as I, like others here, really need a solution.

The asyncio is complex, I am in a learning phase, there are several videos in youtube describing asyncio functions, there are also similar functions in the esp os that handle asyncronous operations. It is about how the system manages parallel tasks. I wrote to the designer Otto W about it but no response yet. I think more investigations about the functions is needed since it is a complex problem.

This is a long thread so apologies if this has been mentioned already but I fixed my BLE connection issues by compiling for an earlier version of arduino, viz;

esphome:
  name: ble_lounge
  platform: ESP32
  board: wemos_d1_mini32
  arduino_version: 1.0.3

Hopefully 1.0.5 will fix the problem.

My Esphome Nodemcu 1.0 board is somehow far to the APs, it only gets the signal about 60-70% (Unifi data). Somedays its all stable, somedays having disconnects-reconnects every 10-15 minutes apart. I was using DHCP on yaml, but assigned IP on Unifi. So i made the yaml static IP to check. Now the disconnects are (mostly) only 1 sec long.

The funny thing is, i have only 1 esphome component, but about 4 Sonoff Tasmota in my installation; whenever esphome starts having disconnection problems, all Sonoff Tasmotas start having the same behaviour too. Before using esphome, Sonoffs were very stable, not even one disconnect. Btw, the Sonoffs and esphome connects to two different unifi APs with the same SSID name and on the same network.

Now i wonder, could it be something related to mDNS? I guess Tasmota uses mDNS by default too. On ESPHome Wiki:

ESPHome uses mDNS to show online/offline state in the dashboard view. So for that feature to work you need to enable host networking mode

On Tasmota " SetOption55=Off" disables mDNS off. I think i am gonna try this on Tasmotas at least to understand if the real problem is with mDNS. Feature Request: mDNS default off. And i guess this won’t be possible with ESPHome.

UPDATE: Urrrggh. I learned about this Fingbox mDNS flooding problem and making IOT devices disconnect only recently. Now i unplugged my Fingbox and will try this way. I also had a lot of disconnection problems on Apple Airport Express Airplay; i hope lack of Fingbox will solve both problems. Fingers crossed…

1 Like