ESPHome + OpenWRT(ATH10k radio) = ESP8266 ota timed out

I can corroborate (and know other cases of) the coincidence of ath10k with esphome network issues.
Threw an access point away because of it. So frustrating.

And what about the hotspot? Did you test?

I’m not really sure, but I think I’ve changed/update the drivers below.
While it didn’t solve my problem when I changed it, maybe each change contributed to the stability I have today

ath10k-board-qca988x 20211216-1 ~1.5 KB ath10k qca988x board firmware
ath10k-firmware-qca988x 20211216-1 ~219.7 KB ath10k qca988x firmware
kmod-ath10k 5.4.154+5.10.68-1-1 ~207.3 KB This module adds support for wireless adapters based on…

At least for C7, many people solved some connection issues (not related to esphome), using non-ct drivers.
Have you tested non-ct drivers?

yes I did, I didn’t do kmod- ath10k, only the firmware one.
but now I also did the kmod.

this gave me a 100% fail on eps8266 OTA, so for me unfortunately this doesn’t seem to be the solution.

This is pure network and not even ESP related but I’m really hoping to find a solution.

I guess you already tried to actively get into safe mode on a esp before initiating the ota update? :thinking:

actually safe mode is implied while OTA, so no button needed, but indeed tried that too.
I actually found that disabling safe mode all together works better.
On the other hand I now see this “reboot_timeout” and during my tests I didn’t wait 5 minutes.
So that would mean if an update fails, after 5 minutes the device is always accessible again.

the issue is more “when a sufficient wifi signal is present” so that’s the issue I need to work on.
I found that when the device freezes, disconnect it from the router will make the devices fully operational again. except from the status led which keeps blinking untill a full install has finished. Like it knows ota started or something.

So I think safe mode will work but only because it reboots ESPHome after 5 minutes.
When I have the time I want to try something like this:

ota:
  safe_mode: false
  password: !secret ota_password
  on_error:
    then:
      - logger.log:
          format: "OTA update error %d"
          args: ["x"]
      - lambda: id(wifiId).retry_connect();

where the plan is to reconnect wifi when something fails, at that point the devices will stay fully operational only have the status led blinking.

What I wonder if any of you ATH10k radio :radio: owners already tried the non-CT drivers with openwrt? I just did some 3 minute “(re)search” and it looks like many users have troubles with the default (open source?) shipped CT drivers in openwrt…

this is what @walberjunior suggested,
but for some reason this failed for me.
On the other hand, I will retry this one more time as it has helped so many others.
Maybe after updating the driver I should restart ESPHome before changes are fully applied.

ok,
tried again the non ct driver,
but the issue for non ct seems to be worse

======================== [SUCCESS] Took 119.39 seconds ========================
INFO Successfully compiled program.
INFO Resolving IP address of bijkeuken-light.local
INFO  -> 192.168.180.140
INFO Uploading /data/bijkeuken-light/.pioenvs/bijkeuken-light/firmware.bin (537616 bytes)
INFO Compressed to 372789 bytes
Uploading: [=========                                                   ] 15% 
ERROR Error sending data: timed out

and from the ESP Device POV

19:24:31	[D]	[ota:143]	Starting OTA Update from 192.168.180.13...
19:24:36	[D]	[ota:312]	OTA in progress: 0.3%
19:24:36	[D]	[ota:312]	OTA in progress: 9.9%
19:24:49	[D]	[ota:312]	OTA in progress: 12.0%
19:25:42	[D]	[ota:312]	OTA in progress: 13.6%

it’s like the device doesn’t even know that is has timed out yet.

What do the esphome node logs show when you increase the log level (verbose or higher :mag:)?

Also if you do ota logging are you sure the connection isn’t interrupted? Best would be try with serial logs - they sometimes spit more useful stuff out (specially when debugging connection issues) :bulb:

while using very-verbose

it adds 2 lines to the web log every time I open the website in another tab

|11:21:58|[D]|[ota:312]|OTA in progress: 15.2%|
| --- | --- | --- | --- |
|11:22:01|[D]|[ota:312]|OTA in progress: 15.5%|
|11:22:34|[D]|[ota:312]|OTA in progress: 15.8%|
|11:24:18|[V]|[json:031]|Attempting to allocate 512 bytes for JSON serialization|
|11:24:20|[V]|[json:051]|Size after shrink 60 bytes|
|11:25:13|[V]|[json:031]|Attempting to allocate 512 bytes for JSON serialization|
|11:25:13|[V]|[json:051]|Size after shrink 60 bytes|

but that’s it.

I didn’t try serial logging as most of my devices are mounted into a wall. and I’m not really sure how to read the serial signal.
but I do not except there to be any other information

I would very much expect more information between your timestamps 11:22:34 and 11:24:18. Almost two minutes without any lines in with log level set to VERY_VERBOSE just doesn’t really sound right :thinking:

As you write it’s reproducible you should be perfectly fine just deploy another esphome node on your bench quickly for testing. :man_factory_worker:

Easiest if you have a usb-serial adapter already on your esp board (like a d1 mini). In that case it’s enough to plug it into the machine the esphome dashboard is running. :running_man:

It should be also possible to open web.esphome.io with a chrome base browser and plug in a esp to that machine to get serial logs :raised_hands:

thank,
web.esphome.io really helps, I think I could use this one for other purposes as well.
I only have this issue with esp8285, and all my d1 both mini and max are esp32.

but I connected another device using my serial adapter and serial indeed logs a lot more, but not where I need it.( no time stamps either)

[V][json:031]: Attempting to allocate 512 bytes for JSON serialization
[V][json:051]: Size after shrink 64 bytes
[V][json:031]: Attempting to allocate 512 bytes for JSON serialization
[V][json:051]: Size after shrink 112 bytes
[V][json:031]: Attempting to allocate 512 bytes for JSON serialization
[V][json:051]: Size after shrink 100 bytes
[VV][scheduler:195]: Running interval '' with interval=10000 last_execution=111865 (now=121865)
[VV][api.service:470]: on_ping_request: PingRequest {}
[VV][api.service:043]: send_ping_response: PingResponse {}
[VV][scheduler:195]: Running interval '' with interval=10000 last_execution=121865 (now=131866)
[VV][scheduler:195]: Running interval '' with interval=10000 last_execution=131865 (now=141865)
[VV][api.service:470]: on_ping_request: PingRequest {}
[VV][api.service:043]: send_ping_response: PingResponse {}
[VV][scheduler:195]: Running interval '' with interval=10000 last_execution=141865 (now=151865)
[D][ota:143]: Starting OTA Update from 192.168.180.13...
[V][ota:174]: OTA features is 0x01
[V][ota:194]: Auth: Nonce is 5490b4a9a531164882e1885e8412b3da
[V][ota:214]: Auth: CNonce is 11aa999f37e0ece660dc4e10b901c954
[V][ota:221]: Auth: Result is bc560eef614309ec2b9a8dbf35d61e6c
[V][ota:229]: Auth: Response is bc560eef614309ec2b9a8dbf35d61e6c
[V][ota:257]: OTA size is 386759 bytes
sleep disable
[begin] roundedSize:       0x0005F000 (389120)
[begin] updateEndAddress:  0x000FB000 (1028096)
[begin] currentSketchSize: 0x0008A000 (565248)
[begin] _startAddress:     0x0009C000 (638976)
[begin] _currentAddress:   0x0009C000 (638976)
[begin] _size:             0x0005E6C7 (386759)
[V][ota:274]: Update: Binary MD5 is b86f4edc37bb86d56b886ec02d75e91f
[D][ota:312]: OTA in progress: 0.3%
[D][ota:312]: OTA in progress: 8.5%
[D][ota:312]: OTA in progress: 9.8%
[D][ota:312]: OTA in progress: 10.1%
[V][json:031]: Attempting to allocate 512 bytes for JSON serialization
[V][json:051]: Size after shrink 64 bytes

it stays a way for a couple o minutes, or an hour, up intil the devices decides to restart the wifi. or I go to my ap and kick the device.

Hi all,
I was having the OTA timeouts on my ESP8266 devices where the upload would take about 2 mins before timing out at some point. I have OpenWRT on a TP-Link Archer C6 v2 (EU), with OpenWrt 22.03.3.

I tested the procedure listed here:

My ssh commands were:

opkg update
opkg remove ath10k-firmware-qca9984-ct kmod-ath10k-ct
opkg install wget ath10k-firmware-qca9984 kmod-ath10k
cd /lib/firmware/ath10k/QCA9984/hw1.0/
mv board-2.bin board-2.bin.bk && mv firmware-5.bin firmware-5.bin.bk
wget -O firmware-5.bin https://github.com/kvalo/ath10k-firmware/blob/224d3c4b74553c1e01458db19451c37f8666f92c/QCA9888/hw2.0/3.9.0.2/firmware-5.bin_10.4-3.9.0.2-00157 --no-check-certificate
wget https://github.com/kvalo/ath10k-firmware/blob/224d3c4b74553c1e01458db19451c37f8666f92c/QCA9888/hw2.0/board-2.bin --no-check-certificate

So that’s the firmware version “10.4-3.9.0.2-00157” and the corresponding board-2 version. Then reboot.
The first OTA update attempt after update went through in about 20 seconds and there were no timeouts!

Very happy!

1 Like

thnx,

for me for somehow it doesn’t help with my ESP8266 OTA, this one is still not ok, but maybe it’s my debug device.
however my connection to libretiny bk72xx devices seems to be improved on first sight.
I noticed in your scirpt you use 2 diffent firmwares qca9984 and qca9888, shouldn’t this be all the same?
I did

opkg update
opkg remove ath10k-firmware-qca9888-ct kmod-ath10k-ct
opkg install wget ath10k-firmware-qca9888 kmod-ath10k
cd /lib/firmware/ath10k/QCA9888/hw2.0/
mv board-2.bin board-2.bin.bk && mv firmware-5.bin firmware-5.bin.bk
wget -O firmware-5.bin https://github.com/kvalo/ath10k-firmware/raw/master/QCA9888/hw2.0/3.9.0.2/firmware-5.bin_10.4-3.9.0.2-00157 --no-check-certificate
wget https://github.com/kvalo/ath10k-firmware/raw/master/QCA9888/hw2.0/board-2.bin --no-check-certificate
reboot

but I have a TP-link deco M4R