ESPHome device disconnected in Home Assistant and won't reconnect automatically

Got a bit of a puzzling issue here, let me describe it:

I have a number of ESPHome devices, some ESP8266 (Sonoffs and NodeMCUs), a a couple ESP32. Every now and then Home Assistant loses connection to one of them and does not reconnect, even though the device is reachable. This is the situation I’m describing here:

A device called irrigation_pump (a Sonoff Basic with this config) is showing unavailable for about 17h now. I’ve power-cycled it and confirmed it is accessible in the network, yet HA will not reconnect:

If I try to get the logs, you can see the device is responding just fine: (and if I run esphome dashboard . it shows-up as online)

When I SSH into the HA SSA addon and try to ping it, you can see that the name (irrigation_pump.local) got resolved to the right IP address and ping returns, so HA is able to “see” the device, from a networking standpoint at least:

Finally, looking at the logs in HA, the only log which has anything about this device is “Home Assistant Core”, and then only these entries:

Note the timestamps: these are from around the time the device was disconnected (yesterday), and there was none more recent, which makes me think that HA is not trying to reconnect at all.

2022-10-13 13:55:31.628 WARNING (MainThread) [aioesphomeapi.reconnect_logic] Can't connect to ESPHome API for irrigation_pump @ Error while reading data: [Errno 104] Connection reset by peer
2022-10-13 14:02:27.265 WARNING (MainThread) [aioesphomeapi.reconnect_logic] Can't connect to ESPHome API for irrigation_pump @ Hello timed out
2022-10-13 14:04:12.925 WARNING (MainThread) [homeassistant.components.esphome] Error getting initial data for Connection not done for irrigation_pump @!
2022-10-13 14:05:18.491 WARNING (MainThread) [aioesphomeapi.reconnect_logic] Can't connect to ESPHome API for irrigation_pump @ Hello timed out
2022-10-13 14:20:10.400 WARNING (MainThread) [aioesphomeapi.reconnect_logic] Can't connect to ESPHome API for irrigation_pump @ Hello timed out
2022-10-13 14:24:14.467 WARNING (MainThread) [aioesphomeapi.reconnect_logic] Can't connect to ESPHome API for irrigation_pump @ Error while reading data: [Errno 104] Connection reset by peer
2022-10-13 14:25:23.058 WARNING (MainThread) [aioesphomeapi.reconnect_logic] Can't connect to ESPHome API for irrigation_pump @ Timeout waiting for response for <class 'api_pb2.ConnectRequest'>
2022-10-13 14:25:36.889 WARNING (MainThread) [homeassistant.components.esphome] Error getting initial data for Connection not done for irrigation_pump @!
2022-10-13 14:26:13.622 WARNING (MainThread) [aioesphomeapi.reconnect_logic] Can't connect to ESPHome API for irrigation_pump @ Timeout while connecting to ('', 6053)
2022-10-13 14:29:18.417 WARNING (MainThread) [aioesphomeapi.reconnect_logic] Can't connect to ESPHome API for irrigation_pump @ Hello timed out
2022-10-13 14:33:28.248 WARNING (MainThread) [aioesphomeapi.reconnect_logic] Can't connect to ESPHome API for irrigation_pump @ Error while reading data: 0 bytes read on a total of 1 expected bytes
2022-10-13 14:33:38.889 WARNING (MainThread) [homeassistant.components.esphome] Error getting initial data for Connection not done for irrigation_pump @!

QUESTION: is there anything else I can do to try and understand why HA will not reconnect to this device? I can try restarting HA or removing and re-adding the device, but I rather understand what causes this issue so it can be prevented in the future.

1 Have you tried setting manual_ip in esphome?
2 Try Turning logger to INFO as debug uses a lot of resources
3 try removing the binary sensor lines that use GPIO0 to see if it connects better.

Thanks @Spiro for jumping on this, but I’m afraid these won’t make much of a difference, as I explain below:

Not yet, I much prefer DHCP for these devices; but the assigned IP has not changed in quite some time and you can see from the logs that HA sees the correct IP address, so that’s not the issue, at least in this case.

This is a rather “quiet” device, so even at DEBUG level, there’s at most one new like of log every 10s or so, hardly breaking a sweat. The device is stable and does not restart (as seen by watching the logs over WiFi).

I cannot, as this is a Sonoff Basic WiFi relay, and GPIO0 is connected to a pushbutton in the device. It is used (in the Sonoff) to manually turn it on/off if desired, but for all practical purposes just sits there unused all the time.

At this point, if I remove this device from HA and re-add it, all will work fine for several months again. But I won’t do that, at least right now, as I’m more interested in learning why HA does not attempt to reconnect, as the ESP device is clearly behaving correctly and willing to connect.

And just because I could not afford to have my irrigation pump offline for the weekend, I restarted Home Assistant (Settings > System > Restart), and voilá, the same device shows-up as “connected” again.

As I said, this is not a problem in the ESPHome device, it is a problem in the ESPHome API integration in Home Assistant, which gets into a state where it will not re-try connecting with a device.

I have almost the same. I updated my ESP with bluetooth proxy and it works only for couple hours after that ESPs are unavailable. Only HA restart helps (for some time).

Last ~12h As I can see ESP are available (status is on).
But in the morning Curtains again goes offline.

@heckler Just a quick note: Have you updated ESPHome & devices? I think it was a bug in the past that has been fixed.

If so I would remove them and reinstall them again. Maybe

I’m running ESPHome on my PC (not on Home Assistant), and I’m currently on version 2022.9.4

But that is an excellent tip, as I just checked and the the particular device I reported on here hasn’t seen any changes in a long time, it is still running on 1.16.2, built in Oct 2021.

I will update it later today and keep monitoring - this is a very sporadic issue, and since the HA restart a week ago it did not re-occur.

Update: did a round of updates on my ESPHome devices: no problem with the NodeMCU ones, but I had two Sonoff Basic devices which never came back after OTA; no matter how many times I power-cycle them, they don’t show-up on the network again. I’ve stopped with the update for now until I take them down and plug via serial to see what’s wrong with those two.

After updating a couple Sonoff Basic devices to 2022.9.4 they stopped responding on the network. Taking them down and plugging a serial monitor showed the device on a restart loop with the following error:

[0;32m[I][logger:243]: Log initialized[0m
[0;35m[C][status_led:014]: Setting up Status LED...[0m
[0;35m[C][ota:465]: There have been 5 suspected unsuccessful boot attempts.[0m
[0;32m[I][app:029]: Running through setup()...[0m
[0;35m[C][switch.gpio:011]: Setting up GPIO Switch 'Irrigation Pump Relay'...[0m
[0;36m[D][switch:017]: 'Irrigation Pump Relay' Turning OFF.[0m
[0;36m[D][switch:037]: 'Irrigation Pump Relay': Sending state OFF[0m
[0;36m[D][switch:017]: 'Irrigation Pump Relay' Turning OFF.[0m
[0;36m[D][binary_sensor:034]: 'Irrigation Pump Button': Sending initial state OFF[0m
[0;36m[D][text_sensor:067]: 'Irrigation Pump Version': Sending state '2022.9.4 Oct 20 2022, 16:34:53'[0m
[0;35m[C][wifi:037]: Setting up WiFi...[0m
[0;36m[D][wifi:384]: Starting scan...[0m
[0;36m[D][sensor:126]: 'Irrigation Pump Uptime': Sending state 0.15500 s with 0 decimals of accuracy[0m
[0;36m[D][wifi:399]: Found networks:[0m
[0;32m[I][wifi:442]: - 'aria' [5m(C4:E9:84:BC:D9:A0) [6m[0;32m▂▄▆█[0m[0m
[0;36m[D][wifi:444]:     Channel: 7[0m
[0;36m[D][wifi:445]:     RSSI: -39 dB[0m
[0;32m[I][wifi:442]: - 'ubnt' [5m(FC:EC:DA:87:76:91) [6m[0;33m▂▄[0;37m▆█[0m[0m
[0;36m[D][wifi:444]:     Channel: 11[0m
[0;36m[D][wifi:445]:     RSSI: -67 dB[0m
[0;32m[I][wifi:442]: - 'ubnt' [5m(74:83:C2:41:97:E2) [6m[0;33m▂▄[0;37m▆█[0m[0m
[0;36m[D][wifi:444]:     Channel: 11[0m
[0;36m[D][wifi:445]:     RSSI: -71 dB[0m

--------------- CUT HERE FOR EXCEPTION DECODER ---------------

Exception (28):
epc1=0x40223401 epc2=0x00000000 epc3=0x00000000 excvaddr=0x0ee60000 depc=0x00000000


ctx: cont
sp: 3ffffa10 end: 3fffffc0 offset: 0190
3ffffba0:  00000005 3ffe8c89 3fff06a4 402079bb  
3ffffbb0:  0ee60001 000001bf 00000004 3fff248c  
3ffffbc0:  3fff2374 3fff2414 00000000 3fff2324  
3ffffbd0:  3ffffc60 3ffe8c89 3fff0b34 40210978  
3ffffbe0:  3ffffc30 3ffffc20 00000010 001b7716  
3ffffbf0:  000003eb 000003eb 3ffe85dc 402109a4  
3ffffc00:  3ffffc30 3ffffc20 00000010 3ffffc20  
3ffffc10:  3ffffc30 3ffffc20 00000010 40100f96  
3ffffc20:  000000c8 3ffe8c89 0000007a 00000083  
3ffffc30:  3fff2334 3ffffc60 3fff22ac 4022dce4  
3ffffc40:  00000009 3ffe8c89 3fff0b34 4020d415  
3ffffc50:  4025dccc 000000e2 0000ea60 00000000  
3ffffc60:  383a4137 32433a33 3a31343a 453a3739  
3ffffc70:  40220032 3fff1650 40223870 00000002  
3ffffc80:  00004bc6 00000000 00000000 fffffffe  
3ffffc90:  00000000 4bc6a7f0 94fdf3b6 001b77d4  
3ffffca0:  00000000 00000000 4bc6a7f0 00000000  
3ffffcb0:  0000000a 3fff0b34 40100641 00000000  
3ffffcc0:  00000000 4bc6a7f0 99999999 001b77d9  
3ffffcd0:  00000000 4bc6a7f0 a0000000 001b77e1  
3ffffce0:  00000000 00000000 4bc6a7f0 00000000  
3ffffcf0:  000017c1 00000000 40100641 00000000  
3ffffd00:  3fff2414 3fff0c34 3fff2224 3ffe9154  
3ffffd10:  3fff230c 3fff2224 3fff0a34 00000102  
3ffffd20:  3fff0b34 0000000a 3fff0b34 00000102  
3ffffd30:  3fff0b34 000017c1 3fff0b34 4020d7e8  
3ffffd40:  3fff0b34 0000000a 3ffef4cc 4022387c  
3ffffd50:  3fff0160 3ffffd70 3ffef4cc 40223948  
3ffffd60:  3fff0b34 0000000a 3ffef4cc 4020fded  
3ffffd70:  0000000b 0000000b 3fff185c 3fffff68  
3ffffd80:  0000002c 00000004 00000001 40100fc0  
3ffffd90:  3fff1068 3ffef59c 00000008 3fffff68  
3ffffda0:  3fff1084 3ffef5bc 3ffef4cc 40212318  
3ffffdb0:  3fff0cb4 00000017 0000001e 00000000  
3ffffdc0:  00000000 00000000 00000000 00000000  
3ffffdd0:  3ffffdd8 00000008 78656c66 73676f64  
3ffffde0:  00000000 00000000 00000000 00000000  
3ffffdf0:  00000000 00000000 00000000 00000000  
3ffffe00:  00000000 00000000 3ffffe10 00000005 <
3ffffe10:  53544151 00000049 00000000 00000000  
3ffffe20:  00000000 00000000 3ffffe30 00000009 <
3ffffe30:  66696175 34376961 00000037 00000000  
3ffffe40:  00000000 00000000 00000000 00000000  
3ffffe50:  00000000 00000000 00000000 00000000  
3ffffe60:  3ffffe68 00000004 61697261 00000000  
3ffffe70:  00000000 00000000 00000000 00000000  
3ffffe80:  3ffffe88 00000009 66696175 34376961  
3ffffe90:  00000037 00000000 00000000 00000000  
3ffffea0:  00000000 00000000 00000000 00000000  
3ffffeb0:  00000000 00000000 3ffffec0 00000005 <
3ffffec0:  73746171 00000069 00000000 00000000  
3ffffed0:  00000000 00000000 3ffffee0 00000009 <
3ffffee0:  66696175 34376961 00000037 00000000  
3ffffef0:  00000000 00000000 00000000 00000000  
3fffff00:  00000000 00000000 00000000 00000000  
3fffff10:  3fffff18 00000004 746e6275 00000000  
3fffff20:  00000000 00000000 00000000 00000000  
3fffff30:  3fffff38 00000009 66696175 34376961  
3fffff40:  00000037 00000000 00000000 00000000  
3fffff50:  00000000 00000000 00000000 00000000  
3fffff60:  00000000 00000000 3fff11cc 0000000b  
3fffff70:  3a69646d 2d77656e 00786f62 00687467  
3fffff80:  3fff15ac 3ffef520 0000ea60 feefeffe  
3fffff90:  feefeffe feefeffe feefeffe 3ffef8ec  
3fffffa0:  3fffdad0 00000000 3ffef8d8 4021fcf8  
3fffffb0:  feefeffe feefeffe 3ffe85d8 40100489  

--------------- CUT HERE FOR EXCEPTION DECODER ---------------

 ets Jan  8 2013,rst cause:1, boot mode:(3,7)

load 0x4010f000, len 3460, room 16 
tail 4
chksum 0xcc
load 0x3fff20b8, len 40, room 4 
tail 4
chksum 0xc9
csum 0xc9

Final update for now:

  • 2022.9.4 caused an exception on my Sonoff Basic (platform: ESP8266 / board: esp8285) devices
    when connecting to WiFi (see above) and the device got in a loop. Had no issue with 2022.9.4 with any of my NodeMCU devices (platform: ESP8266 / board: nodemcuv2)
  • Updating (via serial, had to take them down!) the Sonoff Basic devices to version 2022.10.0 solved the issue and brought the devices back to the network
  • Could not determine from the change logs what could be the difference between 2022.9.4 and 2022.10.0 which caused this issue

@heckler Glad you have resolved your issue :+1: