OTA updates to my nodes consistently fail about 30% of the time. The firmware compiles fine, the package starts uploading, then slows down, then the node finally times out (with a broken pipe or other error, probably from a watchdog timeout?). I go back and retry those nodes a few times and it eventually gets there.
Anything I can look at? I should have good WiFi all through the affected areas (according to UniFi and all my testing). It is a consistent problem with my nodemcu devices.
Typical log from one that fails. Percentage complete will vary:
Retrieving maximum program size /data/main_house_garage/.pioenvs/main_house_garage/firmware.elf
Checking size /data/main_house_garage/.pioenvs/main_house_garage/firmware.elf
RAM: [==== ] 39.6% (used 32408 bytes from 81920 bytes)
Flash: [==== ] 38.1% (used 398308 bytes from 1044464 bytes)
========================= [SUCCESS] Took 7.47 seconds =========================
INFO Successfully compiled program.
INFO Connecting to 192.168.0.196
INFO Uploading /data/main_house_garage/.pioenvs/main_house_garage/firmware.bin (402464 bytes)
Uploading: [==================================================== ] 87%
ERROR Error sending data: [Errno 32] Broken pipe
Same problem sometimes here. Try to build the firmware with the dashboard âManual downloadâ and
install the new firmware with the device webserver âOTA Updateâ.
Thanks, worked like a charm! Of course, I needed to enable the web_server in the code and install with a USB cable, but once I did that, further updates went in smooth as silk. Seems like the stock HA OTA method needs some work.
You may have a problem with the wi-fi network.
This problem occurs when the level is very low.
Your wireless connection may be unstable- Look at the signal strength.
Yes, the wifi in the locations is not ideal, but it -75dB or better, which is not tragic. If the web interface can upload without fail, there should be a way for the OTA to work the same way.
I also have this issue even for nodes which have good WiFi connectivity. I just cant upload via OTA. Is this related to how esphome uploads the code? My device is at a position which is not easy to take out , i have been struggling with this.
I do have this issue with nodes which do not have a very good WiFi connection and is understandable for them but for nodes near to the router, it shouldnât be.
This is, FWIW, one of a handful of threads on here about (or seemingly about) this problem.
Iâve been wrestling with it for months.
In each case, the common culprit seems to be WiFi, and not necessarily signal strength (although that seems to be a factor too), but some kind of compatibility issue between ESPâs WiFi stack and some Access Points where the link degrades over time to where packets are being dropped.
To see this quickly and easily, just let ping run against one of the ESP nodes for 30 seconds, and see how many packets are lost.
For a quick test/workaround, try setting up an alternative AP in place of your normal one (same SSID&passwd), and see if reliability doesnât jump up considerably.
In my case, if I run OpenWrt firmware on my AP, the problem appears after a day or so. But if I revert it to factory firmware, the problem never appears.
@sjude68
I have massive problems when I use a 2.4 GHz as well as the 5 GHz WLAN (Ubiquiti UniFi-AP). If I have an access point that only has 2.4 GHz, it works if the device is not too far away from the access point.
Even I have noticed that when the router is rebooted the upload works on a dual-band router. But when the esphome device is connected to the only 2.4Ghz access point the upload works fine.
So does that mean the upload problem is the 2.4Ghz + 5Hhz access point and not the esphome device.
I believe the cause is within or closely related to the driver software for the APâs WiFi chipset.
Some chipsetsâ drivers are more prone to this degradation of link quality than are others.
e.g. I switched from a Qualcomm/Atheros-based AP to a MediaTek-based one, and itâs been like night and day in terms of stability for all connected devices.
What helped in my case was cycling power to esp32 right before doing OTA.
Without, OTA would always timeout somewhere between 1% and 10%.
With cycling power, OTA has always succeed so far.
This is true for some older esp32 devkit v1 boards. The new ones I bought donât timeout.
Similar problems here with 3 Shelly 1 devices (in bedrooms . Tested after power cycle (circut breaker) OTA fails from esphome dashboard after +/- 30%. Webserver enabled and manually downloaded firmware reaches between 80% and 90% before the device becomes unresponsive ('inout events (button presses) no longer reach home assistant. Sometimes it crashes completely, sometimes it only shows a very basic basic webpage (device name only, no relay control or log). But after a power cycle the device is usable again.
I was struggling with 2 esp32âs that would randomly drop off the network. Even moving them to within a few feet of the AP didnât seem to help. I could start a continuous ping and see them stop responding to replies at random intervals. I finally isolated it by figuring out that they worked perfectly if they were connected to any other AP in my house except the one where I needed them. I was going to swap 2 APâs to see if the problem followed the âbadâ AP, but decided to just power cycle the suspect AP (Unifi U6 Lite) and immediately both ESPâs stopped dropping packets.
I canât say for sure if it was related to the driver/chipset, because I have been doing a lot of tweaking with WiFi channels, power settings, band steering, etcâŚ, and Iâm guessing itâs possible that AP just got âconfusedâ.