ESPHome devices becoming ‘unavailable’ in Home Assistant

I have several ESPHome devices, some NodeMCU, some ESP32, some ESP01. Every now and then one of them start showing as ‘unavailable’ in HA, but if I type esphome device.yaml logs on my PC, I can get the logs just fine. The IP address reported on the cmdline is the same I see on the HA error message.

When this happens, the device in HA never recovers, it will remain unavailable until I force a reset of the core - no need to reset the host, just the core, but without a reset it never comes back from unavailable on its own.

These devices are configured with DHCP addresses, but that’s not the issue, as the IP has not changed, and HA has the correct value. Also, all of my devices already have power_save_mode: none in their configs.

For example, I have this on the HA logs:

Error getting initial data for 192.168.6.13: Connection not done for balcony_sensors @ 192.168.6.13!
1:27:15 PM – (WARNING) ESPHome - message first occurred at 1:27:10 PM and shows up 2 times

Logger: homeassistant.components.esphome
Source: components/esphome/__init__.py:297
Integration: ESPHome (documentation, issues)
First occurred: 1:27:10 PM (2 occurrences)
Last logged: 1:27:15 PM

Error getting initial data for 192.168.6.13: Connection not done for balcony_sensors @ 192.168.6.13!

And the device, as seen under Integrations, has all its sensors unavailable:

image

And the output from esphome logs is ok and reports the same IP that HA tried to connect to:

Any idea what could be the issue here?

(Note: this is possibly related to this other topic: ESPHOME ESP32 intermittent unavailability “Became unavailable” - ESPHome - Home Assistant Community (home-assistant.io))

1 Like

I am having the same issue here.

A bit of history:
This was happening 3-4 months ago for a few weeks, then it seemed to have resolved itself. Now, in the past 3 days, it’s happened twice again The first time, several devices went unavailable. It happened on the same day as my eero did a firmware update, so I thought maybe it was due to dhcp renewals (although the eero was only offline for like 30 seconds, no reason why it should have impacted dhcp renewals).

Today it happened again with no reason, but only with 1 device.

No changes to the esphome devices (they’re tuya based switched plugs). They’re running 6 month old firmware that’s been very stable.

I can also confirm that at the same time as HA says they’re unavailable, I can connect to “{device}.local” and access the switch.

If we can’t figure out why they’re disconnecting, can we at least get HA to retry connecting to them without manual intervention?

Or is there some setting/update in the esphome code that will fix this? That’s a big project for me as I have many of these switches and they’re in mission-critical locations (they run my grow-room lights). I can’t risk making things worse.

I have not had time yet to dive into the ESPHome integration to figure out how it handles retrying, but I can tell you this:

I’ve added an alert via Telegram anytime one of my ESPHome devices becomes unavailable for longer than 15min. When I get that alert, I reply (on telegram) with a restart command, which basically performs a core restart by invoking “homeassistant.restart”. After the HA restart, all devices are magically back to normal.

So that, and being able to do an “esphome xxx.yaml logs” while HA thinks the device is unavailable makes think that this is just something missing with a retry reconnect logic.

@heckler I found this with one of my ESP32 running WLED but you could try this

My suggestion would be to go into your HA settings / Devices & Services / find your ESPHome integration, click on it and go into the link that says “1 device”. Under configuration you should see a “restart” press button. Press it. Once I did that my ESP32 has worked perfectly and hasn’t missed a beat for days now. Only have to do it once.

Or maybe try adding the Restart Button into your Yaml and press it.

I don’t know if this next step works on ESPHome but it did work on WLED.

If you don’t see the “Restart” button then get the IP address and take note of it. Then go into HA uninstall it by going to settings / Devices & Services / find your ESPHome integration, click on it and you should see “1 device and ?? entities” click on the the 3 dots and the red delete.

Then go into “+ Add Integration” type “ESPHome” and select it, enter the IP address of your device you noted before and follow the steps. Go back into the device as above and you may see the “Restart” press button. I had 9 new entities appeared when I manually did it over the auto configure / auto discovery, go figure. One of them was the “Restart” press button. If you can see it now press it. I don’t know why this fixed my ESP32 but it did every time I tested it (more than once).

Let us know if that fixes it as we then can issue a bug to HA. I am not 100% sure but it could be the new push reset button.

@Blacky thanks for the response! I’m not sure I completely follow it though; on my ESPHome devices I do have the ‘restart’ switch exposed, as follow:

switch:
  - platform: restart

What that does is expose through the API a switch that instructs the firmware to perform a restart of the device. But to invoke that switch your client (HA in this case) needs to be connected to the ESPHome device via the API, and when I run into connectivity issues what happens is that while the ESP device is still running and connected to the network, HA is not connected to the device, so it does not collect data nor can it issue commands (such as a restart)

One thing which I did recently and it had a positive difference with this issue: in the past I used to have 2.4GHz and 5GHz wifi networks using the same SSID, figuring each device would pick the one which had better signal available. The Pi running HA is connected via ethernet, so that’s no issue there, but the ESP devices are all wireless - and would connect only to 2.4GHz

A few weeks back I decided to split it into two separate SSIDs, one for each frequency (channel allocation is ‘auto’ for both). Since doing this change I’ve only had a few sporadic issues with devices becoming unavailable. Still monitoring, but wanted to throw this here FYI.

@heckler thanks for the info!

I had the same problem… ESP connected to Wi-Fi but would become “unavailable” in HA. I don’t know why but when I pushed the Restart button once for that device I wouldn’t get a connection issue with HA ever again. This happened on a few tests and I have no idea why (like magic).

But Note: The ESP32’s in question was running WLED integration. So this could be a factor.

PS: My network will swap automaticity from 2.4GHz to 5GHz on the same SSID if required and my Pi is connected via Ethernet. My ESP’s are only 2.4GHz capable

FWIW, this syndrome appears to be very similar to several reports in this forum (and each ends up getting its own thread).
It has been happening for months/years, to random people, random devices, random times.
I’ve been following the cases because this problem bit me for a while.

One thing that seems common to all cases is that some change to the WiFi environment (which one would reasonably think should have nothing to do with ESPHome or HA) is coincident. e.g. the AP got new firmware, or a new member of the WiFi network is occasionally causing congestion and minor packet losses or latency.
The other thing the reports have in common is that the device’s log shows no problem, nor does the WiFi Access Point’s log.

My hypothesis is that either:
the protocol library used by ESPHome and HA to communicate (protobuf/gRPC)
or
transporting in a session-layer socket that doesn’t ride TCP
or
the TCP implementation used by ESPHome (Arduino’s?)
is making the link weak.
That weakness makes the link very intolerant of lost or overdue packets. So, a moderately congested or mildly-lossy WiFi link (that other apps won’t consider a problem) will cause HA to believe the device has gone unavailable, even though its web interface remains accessible and MQTT links are unaffected.

Nothing gets logged from these dropouts because a lost packet here and there does not represent a serious problem to most IP protocol stacks - they are designed to deal with such things.
But (I’m hypothesizing) ‘protobuf’ isn’t/can’t.

If you search in this forum for ‘unavailable’ you may find that many of the other cases sound similar to your own.

In my case, replacing the WiFi AP resulted in zero ‘unavailable’ problems. YMMV, but I’d start with ensuring that your WiFi link is as loss-free as possible. Other systems won’t notice the losses/latency, but HA/ESPHome will.

4 Likes

Hey ! I had the same problem and I found this in ESPHome website :

This has solved my problem. I removed all the ( _ ) from my titles in ESPHome and now they are always available in HA !

Hope this can help :smiley:

2 Likes

@glyndon I have noticed you have invested a lot on forums to solve this, but I have never seen you to try this, check this out.

POSSIBLE SOLUTION for others:
I have tried many things to solve the disconnections and “becomming unavailable”, maybe some of those had impact as well (using ping instead of mdns, software offload in operwrt router,…), but even though I have tried everything I could “set” it always happened sooner or later even though with lower frequency. However one day I finally decided to add this into my esp firmware:

sensor:
  - platform: wifi_signal
    name: Mini sonoff Wifi Signal Strength
    update_interval: 60s
  - platform: uptime
    name: Mini sonoff Uptime

To my suprise the signal power oscillated around -85 and you can clearly see that is bad here:

In my case I had sonoff mini r2, which I have put behind a light switch. What was suprising that if I measured with my phone right at the switch the signal was quite much better. Not best, but I thought it should work. Only when I actually measured the signal with the sonoff itself I realised how much was blocked by the switch and a wall around it.

So I have tried to wiggle my antennas a little bit and I have got it from oscialling around -85 to touching -80 and that change by itself made an incredible improvement, maybe solved that, but later by directing antennas even better I was able to get even to -75, no problem ever since.

Conclusion:
So I dont want to say that other “solutions” like changing the channel width to 20 instead of 40 and so on dont solve it, but if you still keep having disconnections try it as I have. I have many times read that these could be network/wifi problems, but I thought if download and upload is ok there, it should be fine. Nowhere I have read how to make sure that signal is ok (how to is above). If wiggling antennas didnt make a difference, try wifi extender.

Important note:
Measuring download and upload speed is not enough! Nor is measuring signal by your esphome device with other device! It is the signal strength inside the appliance that is important for these issues!

I did fiddle with various ways of improving the signal strength, and a weak signal would definitely cause unpredictable behaviours. With almost any WiFi device.

In the context of the problems I was seeing with ESPHome devices, one of the first things I noticed is that proximity and signal strength made no difference. They’d still drop off at random times.
For me, the only lasting solution was to replace my A7V5 Tp-link AP with one that used a different chipset.

Late to this thread, but the solution is to add an empty will_message: line to your ‘mqtt:’ section in the EspHome device YAML.

mqtt:
    broker: 192.xxx.xxx.xxx
    username: xx
    password: xx
    will_message:

When a timout occurs due to an MQTT keepalive message being missed, the broker sends a disconnect message to HA, this is controlled by the '‘Last Will and Testament’ MQTT configuration in ESPHome. That’s a stupid name, but it means that if the broker misses a keepalive, then it tells HA that the device is disconnected, and it doesn’t reconnect when the next message is recieved. WTF.

Supplying an empty will_message prevents this from happening. I struggled for ages with this, despite my esphome device being right next to a WiFi mesh node. I think the mesh node must have had occasional drop outs, and this caused a permanent MQTT disconnection for the device connected to the mesh node. I’ve had zero ploblems since adding and empty will_message.

Hope this helps someone.

1 Like

Any updates on this topic? Its keep occurring from time to time…

Any updates on this topic? Its keep occurring from time to time…

Well, I solved my problem by isolating all of my ESP devices (mostly Sonoff) to a 2.4GHz WiFi network. Previously I had them on a network which had 2.4 and 5 GHz sharing the same SSID, and they’d constantly drop connection. Once I separated the frequencies and had the ESPs all on 2.4 GHz, never had a problem anymore. YMMV.

I have had this problem too with one of my ESP8266 devices. I fixed it by adding a ping sensor which keeps the device alive:

esphome:
  name: test_device
  platform: ESP8266
  board: esp01_1m
  libraries:
    - ESP8266WiFi
    - https://github.com/akaJes/AsyncPing#95ac7e4

external_components:
  - source:
      type: git
      url: https://github.com/trombik/esphome-component-ping
      ref: main

sensor:
  - platform: ping
    ip_address: 192.168.1.1 ## Router IP
    num_attempts: 5
    timeout: 1sec
    loss:
      name: Packet loss
    latency:
      name: Latency
      accuracy_decimals: 3
    update_interval: 30s

that looks like a very good hint thank you