ESPHome + OpenWRT(ATH10k radio) = ESP8266 ota timed out

I went over there and searched for ESP8266. Lots of posts suggest that this is a known problem over there, too.
Someone who knows 802.11 and radio drivers really well could probably determine which one is the culprit. Then I realized that 8266’s are becoming passe’ so I predict that neither group is going to feel it’s worthwhile to solve. I hope that’s not a valid prediction.

1 Like

Even with the launch of the ESP32 a few years ago, the demand for ESP8266 has not abated, with so far, the sales volume in 2021 exceeding 30 million units
CNX Software - Embedded Systems News

I’m sure the esp82xx’s will be still deployed widely over the next years - if not even decade

1 Like

I have a few more things to check. I have an archer c2 with openwrt in my garage. An esp32 and a sonoff mini are connected to it. And none of them are suffering packet loss. The openwrt version is the same but C7 has some newer packages. I’m going to put a d1 mini in the garage and run some tests.

It seems like proximity may quell the syndrome. You may want to also try putting the ESP further away from the AP to help guard against false-success.

I’ll check the signal. But I have a sonoff mini in the same room as the main router, the C7 and it also suffers packet loss

Same room:

Garage:

Luci version is different, and I will check if there is something different on wireless configuration

I think I can say at the very least the C7 firmware is causing this issue.

d1 connected to C2:



d1 connected to C7:



1 Like

Is your AP a “C2 AC750 v1” or a “C2 V3”?
The V3 has the same ‘target’ as the C7V5 (and my A7V5), but the V1 is different in OpenWrt’s composition.

TP-Link Archer C2 v1

Seems to be a problem with ath10k firmware/drivers.
It seems to me that the issue can be resolved using non-Candela Technologies firmware/drivers.

Read the second post:

1 Like

Makes sense. This seems like a finding to ‘hang one’s hat on.’
Of course there’s no way to easily know whose driver TPlink used in their stock firmware which (for me) seems to not have this problem at all.

My reluctance towards the stock firmware is mostly due to device tracking.
With openwrt I am able to track both android and iphone (wife) reliably with an acceptable delay.
I believe that with the stock firmware I don’t get the same result.

Do you use tp-link as a device tracker?

I did for a while, but desiring to be independent of as many variables as possible, just changed to the mobile-app and device-ping methods of tracking, and they seem to be working out fine for me.
Mainly I was just using tracking to tell me when we were both away, or not, which was a factor in some automations. If we drop off the LAN, pings fail, and that’s good enough.
Which reminds me of a different bug I ran into: the ping platform of device_tracker loads the logs with exception dumps when the device’s name stops resolving (because it’s gone and DHCP has expired).
I submitted it as a bug report, because nothing should fail that ungracefully for such a common use-case as that. But I digress.

Congratulations on so closely identifying the probable cause of all these woes. Now we will know what to look for when they release an update to OpenWRT.

With the iphone everything is more complicated.
If my wife clears recent apps on iphone, the mobile-app doesn’t update the location (or I don’t know how to do it) and she never opens the mobile-app.
And even so I see a very long delay in using mobile-app as tracking, maybe I’m missing something.

And the main reason I started doing automations was because she never disables the alarm system :triumph:

I’ll do some more research about it and see if I can find the courage to mess with the drivers

haha! True about the iPhone and the app tracking when rarely used. It’s like we live in the same house!

For a while, I was just relying on the ping tracker and counting on having good WiFi coverage to get the phones linked up and responding as soon as possible on arrival. In my case it was so HA could turn off some security cams. It was easy to tell when it took too long, as the camera app would start notifying me about motion detected for the first moments of arriving home.
But, even when I was using the OpenWRT integration, sometimes there’d be that 20-30 second delay before HA knew we were home.

safe mode
Check out the link above. It seems to be a way of OTA if your having trouble.

I see this in another post, I will try that. Thanks

But I think I will change the drive today.
I hope that doesn’t break anything
:grimacing:

@Spiro
Unfortunately it didn’t work, no changes

But with the change of driver/firmware the magic happened!!! :grinning_face_with_smiling_eyes:

INFO Successfully compiled program.
INFO Resolving IP address of quartotemp.local
INFO  -> 192.168.0.209
INFO Uploading /data/quartotemp/.pioenvs/quartotemp/firmware.bin (438144 bytes)
INFO Compressed to 301665 bytes
Uploading: [============================================================] 100% Done...

INFO Waiting for result...
INFO OTA successful
INFO Successfully uploaded program.
INFO Starting log output from quartotemp.local using esphome API
INFO Successfully compiled program.
INFO Resolving IP address of salatemp.local
INFO  -> 192.168.0.208
INFO Uploading /data/salatemp/.pioenvs/salatemp/firmware.bin (438112 bytes)
INFO Compressed to 301628 bytes
Uploading: [============================================================] 100% Done...

INFO Waiting for result...
INFO OTA successful
INFO Successfully uploaded program.
INFO Starting log output from salatemp.local using esphome API


It’s still too early to evaluate the driver/firmware change, but at least the OTA issue has been resolved.

For those who have a router that uses the ATH10k radio, uses Openwrt and suffers from disconnection problems, I advise you to search for the non-CT driver/firmware and evaluate if there is any improvement for your router/model

For those who want to test the non-CT driver/firmware, I just sent the commands below to my TP-Link Archer C7 V5:

opkg update
opkg remove ath10k-firmware-qca988x-ct kmod-ath10k-ct
opkg update && opkg install ath10k-firmware-qca988x kmod-ath10k
3 Likes

I’m going to try this tomorrow. Will let you know…

Not so fast.

It seems to me the problem has only been resolved for a few hours.
After the change I was able to update 3 D1 Mini boards several times, but the problem returned.

Need more research

3 hours on the non-CT ath10k drivers, and all’s well so far. Not one dropout out of about 25 devices.
(but, it was always stable for up to a day after a reboot)

Thanks for the test.
What version of OpenWRT did you use?
Of the 25 devices, how many use esp8266? And how many use ESPHome?

I have some packages installed on C7 and C2.

  • Zoretier on C7
  • presence-detector in C7 and C2

I think I will reset the C7 and leave it as the default before making these changes and see if anything else I may have changed is affecting the ESp8266 boards.