OTA updates stalling out

chkaloon · September 4, 2021, 11:38am

OTA updates to my nodes consistently fail about 30% of the time. The firmware compiles fine, the package starts uploading, then slows down, then the node finally times out (with a broken pipe or other error, probably from a watchdog timeout?). I go back and retry those nodes a few times and it eventually gets there.

Anything I can look at? I should have good WiFi all through the affected areas (according to UniFi and all my testing). It is a consistent problem with my nodemcu devices.

Typical log from one that fails. Percentage complete will vary:

Retrieving maximum program size /data/main_house_garage/.pioenvs/main_house_garage/firmware.elf
Checking size /data/main_house_garage/.pioenvs/main_house_garage/firmware.elf
RAM:   [====      ]  39.6% (used 32408 bytes from 81920 bytes)
Flash: [====      ]  38.1% (used 398308 bytes from 1044464 bytes)
========================= [SUCCESS] Took 7.47 seconds =========================
INFO Successfully compiled program.
INFO Connecting to 192.168.0.196
INFO Uploading /data/main_house_garage/.pioenvs/main_house_garage/firmware.bin (402464 bytes)
Uploading: [====================================================        ] 87% 
ERROR Error sending data: [Errno 32] Broken pipe

petsie · September 4, 2021, 2:53pm

@chkaloon

Same problem sometimes here. Try to build the firmware with the dashboard “Manual download” and
install the new firmware with the device webserver “OTA Update”.

chkaloon · September 5, 2021, 8:04pm

Thanks, worked like a charm! Of course, I needed to enable the web_server in the code and install with a USB cable, but once I did that, further updates went in smooth as silk. Seems like the stock HA OTA method needs some work.

asmcavr · September 6, 2021, 10:07am

You may have a problem with the wi-fi network.
This problem occurs when the level is very low.
Your wireless connection may be unstable- Look at the signal strength.

chkaloon · September 6, 2021, 12:32pm

Yes, the wifi in the locations is not ideal, but it -75dB or better, which is not tragic. If the web interface can upload without fail, there should be a way for the OTA to work the same way.

VijayS · November 28, 2021, 7:36am

I also have this issue even for nodes which have good WiFi connectivity. I just cant upload via OTA. Is this related to how esphome uploads the code? My device is at a position which is not easy to take out , i have been struggling with this.
I do have this issue with nodes which do not have a very good WiFi connection and is understandable for them but for nodes near to the router, it shouldn’t be.

glyndon · November 28, 2021, 7:45am

This is, FWIW, one of a handful of threads on here about (or seemingly about) this problem.
I’ve been wrestling with it for months.
In each case, the common culprit seems to be WiFi, and not necessarily signal strength (although that seems to be a factor too), but some kind of compatibility issue between ESP’s WiFi stack and some Access Points where the link degrades over time to where packets are being dropped.
To see this quickly and easily, just let ping run against one of the ESP nodes for 30 seconds, and see how many packets are lost.
For a quick test/workaround, try setting up an alternative AP in place of your normal one (same SSID&passwd), and see if reliability doesn’t jump up considerably.
In my case, if I run OpenWrt firmware on my AP, the problem appears after a day or so. But if I revert it to factory firmware, the problem never appears.

sjude68 · February 11, 2022, 5:07pm

Can you help me with the code to add to enable webserver ota update for D1 Mini.

petsie · February 12, 2022, 7:09am

@sjude68
I have massive problems when I use a 2.4 GHz as well as the 5 GHz WLAN (Ubiquiti UniFi-AP). If I have an access point that only has 2.4 GHz, it works if the device is not too far away from the access point.

OTA Update see: OTA Update Component — ESPHome

sjude68 · February 14, 2022, 5:50am

Even I have noticed that when the router is rebooted the upload works on a dual-band router. But when the esphome device is connected to the only 2.4Ghz access point the upload works fine.

So does that mean the upload problem is the 2.4Ghz + 5Hhz access point and not the esphome device.

glyndon · August 20, 2022, 10:19pm

I believe the cause is within or closely related to the driver software for the AP’s WiFi chipset.
Some chipsets’ drivers are more prone to this degradation of link quality than are others.
e.g. I switched from a Qualcomm/Atheros-based AP to a MediaTek-based one, and it’s been like night and day in terms of stability for all connected devices.

DonLuigi · September 24, 2023, 5:58pm

What helped in my case was cycling power to esp32 right before doing OTA.
Without, OTA would always timeout somewhere between 1% and 10%.
With cycling power, OTA has always succeed so far.
This is true for some older esp32 devkit v1 boards. The new ones I bought don’t timeout.

bbccdd · December 23, 2023, 12:52pm

Similar problems here with 3 Shelly 1 devices (in bedrooms . Tested after power cycle (circut breaker) OTA fails from esphome dashboard after +/- 30%. Webserver enabled and manually downloaded firmware reaches between 80% and 90% before the device becomes unresponsive ('inout events (button presses) no longer reach home assistant. Sometimes it crashes completely, sometimes it only shows a very basic basic webpage (device name only, no relay control or log). But after a power cycle the device is usable again.

mightybosstone · December 23, 2023, 3:17pm

I was struggling with 2 esp32’s that would randomly drop off the network. Even moving them to within a few feet of the AP didn’t seem to help. I could start a continuous ping and see them stop responding to replies at random intervals. I finally isolated it by figuring out that they worked perfectly if they were connected to any other AP in my house except the one where I needed them. I was going to swap 2 AP’s to see if the problem followed the “bad” AP, but decided to just power cycle the suspect AP (Unifi U6 Lite) and immediately both ESP’s stopped dropping packets.

I can’t say for sure if it was related to the driver/chipset, because I have been doing a lot of tweaking with WiFi channels, power settings, band steering, etc…, and I’m guessing it’s possible that AP just got ‘confused’.