TL;DR
If your Sonoff ZBDongle-E (EFR32MG21) crashes the entire ZHA integration every time you try to OTA update a device — especially sleepy end devices like the SONOFF TRVZB — the fix is to flash the coordinator with firmware that supports software flow control. The stock Sonoff firmware can’t handle the serial throughput required for OTA transfers.
Firmware that worked for me: zbdonglee_zigbee_ncp_8.0.3.0_sw_flow_115200.gbl
The Problem
I have 5 SONOFF TRVZB thermostatic radiator valves that needed a firmware update (0x00001300 → 0x00001404). Every time I triggered an OTA update from the ZHA UI, the transfer would start, reach about 0.5-1%, and then the entire ZHA integration would crash and restart. All zigbee devices would briefly go unavailable.
The error in the logs:
bellows.ash.NcpFailure: NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT
Followed by:
zigpy.application: Watchdog failure
And in the UI:
Failed to perform the action update/install.
Update was not successful: <Status.TIMEOUT: 148>
My Setup
- Home Assistant 2026.4.1 (Docker container)
- Coordinator: Sonoff ZBDongle-E (EFR32MG21, CP2102N USB-UART)
- Original firmware: EmberZNet 7.4.4.0, EZSP v13
- Serial config: 115200 baud, flow_control: null
- Network: 25 devices (16 routers, 8 end devices), channel 25
- ZHA OTA provider: z2m (Zigbee2MQTT provider, which includes Sonoff firmware images)
Root Cause Analysis
After a deep investigation, I found that the crash was not a software bug in ZHA, bellows, or zigpy. It was a hardware-level serial communication issue.
Here’s what happens during an OTA update:
- The OTA starts and image blocks begin transferring
- The serial link between Home Assistant and the dongle (115200 baud, no flow control) gets saturated with OTA traffic plus normal device reports from the rest of the network
- The dongle’s NCP firmware itself can’t get ACK responses from the host fast enough
- The NCP sends an ASH ERROR frame (
ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT) - Bellows receives this via
error_frame_received()and enters a failed state - Zigpy’s watchdog detects the failure and restarts the entire ZHA integration
- The OTA transfer is killed
The key insight: the ERROR frame comes from the dongle’s firmware, not from bellows. The dongle has its own internal ACK timeout counter, and we can’t change it from the host side.
The Sonoff ZBDongle-E board does NOT wire CTS/RTS pins for hardware flow control, and the stock Sonoff firmware does NOT support XNCP software flow control. So there’s no way for the dongle and host to throttle each other when the serial buffer fills up.
What I Tried (and didn’t work)
Before finding the real fix, I tried several things:
-
Setting
flow_control: softwarein ZHA config — ZHA failed to start entirely. The stock firmware doesn’t support XNCP flow control, so forcing it breaks the serial connection. -
Disabling automations that send frequent zigbee commands — I had 5 automations sending temperature values to the TRVZBs on every sensor state change. Disabling them reduced bus traffic, but the NCP still crashed from OTA traffic alone.
-
Patching bellows
ACK_TIMEOUTS(5 → 50) inbellows/ash.py— Doesn’t help because the error comes from the dongle firmware, not from bellows giving up. -
Increasing
T_RX_ACK_MAX(3.2s → 6.4s) — Same reason, the dongle side is the one failing. -
Increasing zigpy
MAX_TIME_WITHOUT_PROGRESS(30s → 300s) inzigpy/ota/manager.py— Helped the OTA survive longer between block requests, but the NCP still crashed when blocks actually transferred.
None of these worked because the problem is in the dongle’s firmware, not in the host software.
The Fix
Flash the ZBDongle-E with firmware that supports software flow control.
I used: zbdonglee_zigbee_ncp_8.0.3.0_sw_flow_115200.gbl
This gave me EmberZNet 8.0.3.0 build 581, EZSP v16.
Where to get the firmware
- Darkxst’s firmware builds (recommended for third-party dongles): Releases · darkxst/silabs-firmware-builder · GitHub
- Nabu Casa firmware builder: Releases · NabuCasa/silabs-firmware-builder · GitHub — look for files with
sonoff-zbdongle-ein the name - Web flasher: https://skyconnect.home-assistant.io/firmware-update/
Important: The Nabu Casa SkyConnect/Yellow firmware uses hardware flow control. Do NOT flash that on the Sonoff ZBDongle-E — the board doesn’t have CTS/RTS wired. Make sure you get the Sonoff-specific variant with software flow control.
Flashing
I used the web flasher from a Chrome browser. The process is straightforward — connect the dongle, select the firmware file, flash. Your zigbee network settings (PAN ID, keys, device list) are preserved in the dongle’s flash memory.
Results
After flashing:
- Zero NCP crashes — the
ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNTerror is completely gone - OTA updates work — a 332KB firmware image transferred from 0% to 100% in about 38 minutes with no ZHA restarts
- Blocks transfer at roughly 50 bytes every 300ms (~1.3% progress per 30 seconds)
- Normal zigbee operation is also noticeably more stable
Notes on updating sleepy devices (TRVZB)
Even with the fixed firmware, OTA updates to sleepy end devices like the TRVZB can be tricky:
- First attempt may fail with
ZIGBEE_DELIVERY_FAILED: 3074— this just means the device was asleep. Retry and it will catch the next wake cycle. - Update one device at a time — don’t try to OTA multiple sleepy devices simultaneously.
- Stubborn devices: One of my 5 TRVZBs (the one at depth 3 in the mesh, furthest from the coordinator) refused to update after many retries. What finally worked: I removed it from the radiator, placed it next to the coordinator, pulled the batteries to force a clean reboot, and then the OTA went through. Reinstalled it after the update.
Summary
| Before | After | |
|---|---|---|
| Coordinator firmware | EmberZNet 7.4.4.0 (EZSP v13) | EmberZNet 8.0.3.0 (EZSP v16) |
| Flow control | None | Software (firmware-side) |
| NCP crashes during OTA | Every attempt | Zero |
| OTA result | Crash at 0.5-1% | Completes to 100% |
| ZHA restarts during OTA | Every attempt | None |
If you’re struggling with OTA updates on a Sonoff ZBDongle-E, upgrade your coordinator firmware. It made all the difference.