ESPHome + OpenWRT(ATH10k radio) = ESP8266 ota timed out

Makes sense. This seems like a finding to ‘hang one’s hat on.’
Of course there’s no way to easily know whose driver TPlink used in their stock firmware which (for me) seems to not have this problem at all.

My reluctance towards the stock firmware is mostly due to device tracking.
With openwrt I am able to track both android and iphone (wife) reliably with an acceptable delay.
I believe that with the stock firmware I don’t get the same result.

Do you use tp-link as a device tracker?

I did for a while, but desiring to be independent of as many variables as possible, just changed to the mobile-app and device-ping methods of tracking, and they seem to be working out fine for me.
Mainly I was just using tracking to tell me when we were both away, or not, which was a factor in some automations. If we drop off the LAN, pings fail, and that’s good enough.
Which reminds me of a different bug I ran into: the ping platform of device_tracker loads the logs with exception dumps when the device’s name stops resolving (because it’s gone and DHCP has expired).
I submitted it as a bug report, because nothing should fail that ungracefully for such a common use-case as that. But I digress.

Congratulations on so closely identifying the probable cause of all these woes. Now we will know what to look for when they release an update to OpenWRT.

With the iphone everything is more complicated.
If my wife clears recent apps on iphone, the mobile-app doesn’t update the location (or I don’t know how to do it) and she never opens the mobile-app.
And even so I see a very long delay in using mobile-app as tracking, maybe I’m missing something.

And the main reason I started doing automations was because she never disables the alarm system :triumph:

I’ll do some more research about it and see if I can find the courage to mess with the drivers

haha! True about the iPhone and the app tracking when rarely used. It’s like we live in the same house!

For a while, I was just relying on the ping tracker and counting on having good WiFi coverage to get the phones linked up and responding as soon as possible on arrival. In my case it was so HA could turn off some security cams. It was easy to tell when it took too long, as the camera app would start notifying me about motion detected for the first moments of arriving home.
But, even when I was using the OpenWRT integration, sometimes there’d be that 20-30 second delay before HA knew we were home.

safe mode
Check out the link above. It seems to be a way of OTA if your having trouble.

I see this in another post, I will try that. Thanks

But I think I will change the drive today.
I hope that doesn’t break anything
:grimacing:

@Spiro
Unfortunately it didn’t work, no changes

But with the change of driver/firmware the magic happened!!! :grinning_face_with_smiling_eyes:

INFO Successfully compiled program.
INFO Resolving IP address of quartotemp.local
INFO  -> 192.168.0.209
INFO Uploading /data/quartotemp/.pioenvs/quartotemp/firmware.bin (438144 bytes)
INFO Compressed to 301665 bytes
Uploading: [============================================================] 100% Done...

INFO Waiting for result...
INFO OTA successful
INFO Successfully uploaded program.
INFO Starting log output from quartotemp.local using esphome API
INFO Successfully compiled program.
INFO Resolving IP address of salatemp.local
INFO  -> 192.168.0.208
INFO Uploading /data/salatemp/.pioenvs/salatemp/firmware.bin (438112 bytes)
INFO Compressed to 301628 bytes
Uploading: [============================================================] 100% Done...

INFO Waiting for result...
INFO OTA successful
INFO Successfully uploaded program.
INFO Starting log output from salatemp.local using esphome API


It’s still too early to evaluate the driver/firmware change, but at least the OTA issue has been resolved.

For those who have a router that uses the ATH10k radio, uses Openwrt and suffers from disconnection problems, I advise you to search for the non-CT driver/firmware and evaluate if there is any improvement for your router/model

For those who want to test the non-CT driver/firmware, I just sent the commands below to my TP-Link Archer C7 V5:

opkg update
opkg remove ath10k-firmware-qca988x-ct kmod-ath10k-ct
opkg update && opkg install ath10k-firmware-qca988x kmod-ath10k
3 Likes

I’m going to try this tomorrow. Will let you know…

Not so fast.

It seems to me the problem has only been resolved for a few hours.
After the change I was able to update 3 D1 Mini boards several times, but the problem returned.

Need more research

3 hours on the non-CT ath10k drivers, and all’s well so far. Not one dropout out of about 25 devices.
(but, it was always stable for up to a day after a reboot)

Thanks for the test.
What version of OpenWRT did you use?
Of the 25 devices, how many use esp8266? And how many use ESPHome?

I have some packages installed on C7 and C2.

  • Zoretier on C7
  • presence-detector in C7 and C2

I think I will reset the C7 and leave it as the default before making these changes and see if anything else I may have changed is affecting the ESp8266 boards.

All the devices are ESP8266, all use ESPHome. They are from a variety of origins. Some are Adafruit Huzzah, some are impostor Wemos D1 Minis, some came embedded in a variety of Sonoff switch-plugs a few came in Etekcity/VeSync plugs from Amazon.

OpenWRT 21.02.1

I will test it, maybe tomorrow.

Did you get to test the default driver or did you go straight to non-CT?

Since dropouts were bound to happen eventually using the default composition, I replaced them with non-CT before putting the AP into service.

Changed to non-CT on one of my AP just to see how it goes. Updated one esphome device and no problem, but I was only having occasional problem anyway.

I’m already seeing random dropouts on the devices that have more distance to the AP (weaker signals), so sadly it doesn’t look like the driver change solved it.

I updated to version 21.02 with non-CT drivers/firmware last night.
It did not resolve and it seems that it is worse.

Before upgrading I used the D1 mini that was with ESPhome and installed the tasmota, and although the Sonoff mini works fine with the tasmota, the D1 mini did not work well on the main AP.

Archer C2:

  • Sonoff with Tasmota - OK
  • ESP32 with Esphome - OK
  • D1 mini or nodemcu board with esphome - OK

Archer C7:

  • Sonoff with Tasmota - OK
  • ESP32 with Esphome - OK
  • D1 mini or nodemcu board with esphome - OTA not working
  • D1 mini or nodemcu board with tasmota - I haven’t tested the OTA but there is a lot of packet loss

I’ll test with version 21.02 without changing drivers/firmware, but until I solve this problem I won’t use any more board with ESP8266, since returning the main AP to the original firmware is not an option today that suits me.

@Spiro @glyndon Any changes?

Despite preferring OpenWRT, I’m sticking with factory AP firmware until this is solved.
And I remembered something: At one point I, like you, was using the OpenWRT integration in HA for device tracking.
I abandoned it long ago because it did not reliably report departures for me.
e.g., If I turn WiFi ‘off’ on a phone, OpenWrt would trigger a departure event promptly, as one would hope and expect.
However, if I just walk or drive away, the WiFi ‘session’ (or whatever’s the proper term) doesn’t terminate right away. The AP seems to just wait for the device to reappear on the air, and only terminates it (and report departure) after several minutes to hours. This wasn’t acceptable.
So having OpenWRT installed wasn’t as useful as I’d hoped for device tracking.
Ping on the other hand has proven very consistent as a way of telling if a device is on the LAN or not, and it can be tuned to tolerate devices like phones which may drop offline and only go online for brief periods when their screen is off.

Which OpenWRt integration are you talking about?
I now use UBUS and another device tracker from rmoesbergen.

For me it’s quite reliable, I use UBUS (to avoid false negative with iphone) on C7 and rmoesbergen tracker on C7 and C2. UBUS takes a little longer to close the session and when I leave the house I’m always connected to C2, so it can’t take long to close the session
As soon as I get home, the devices are detected and the departure delay, if I’m not mistaken, is 1 to 3 minutes, I have to check.

Before I started using Home Assistant, I was able to create a firmware to run on ESP32, log into Archer C2 with the original firmware (it was my main router at the time) and return the connected MACs to me, this way I was able to activate or deactivate the alarm system, but it had this delay that you say to end the session, if I’m not mistaken I only managed to reduce this delay after removing Bind ARP, or something like that.