Zigbee Acting Like It's Capped on Devices

I’ve been dealing with a strange issue for the past couple of months that I first thought was a fluke but now I’m wondering if it’s something else.

My Zigbee network currently is:

  • ZHA on HAOS 2024.6.4
  • 46 total devices, 13 of which are non mains powered battery sensors
  • Sonoff Dongle Plus 3.0 (Z-Stack 20210708) on a powered hub + extension cable
  • Current channel 22, but have been cycling around - however it has been working on and pairing fine on channel 11
  • Very carefully laid out mains network to make sure there is plenty of coverage, all battery devices are within 5-10 feet of a mains device

A few months ago I was going to add a new Inovelli switch to my Zigbee network but couldn’t get it to add. This was odd because every Zigbee device I have added is detected within seconds and has been a rock solid integration. I figured I needed to just power cycle everything and try again but got caught up in other things.

In the past couple of weeks I’ve had some of my battery devices fall off the network after a restart and then they will come back and then go offline again after a few hours. This prompted me to start cycling through channels in case new interference was introduced. Moving my network into the 20’s seems to have settled those down for now.

But here’s the issue that is making me wonder if I’ve hit some kind of ceiling: I had an Aqara humidity/temp sensor drop off so, as normal, I replaced the battery as these devices eat batteries. That didn’t bring it back, so I factory reset it and tried to re-pair, no success. I then tried that same Inovelli Zigbee and it wouldn’t pair either. I decided to remove the Aqara and was able to immediately detect and add a Third Reality temp/humidity sensor.

Today another temp/humidity sensor dropped off, again I replace the batteries and it doesn’t re-join (another Third Reality, and it’s about 3 feet from the dongle).

I’ve done the normal troubleshooting of dumping the diagnostics of the dongle and seeing the channel utilization and can’t seem to overcome what seems to be a one-in-one-out scenario. I know there is a firmware update for my dongle and I’ve ordered another dongle so I can have a backup in case the update bricks mine - and while the firmware does address some issues that could help, like higher power to the transmitter, I don’t feel like it’s going to solve this.

Any suggestions on what this might be or other diagnostics I can do on the Zigbee network?

Hedda’s guide ?

I have gone through that and not gotten any success as of yet.

Sometimes the mains powered devices do not route packets from certain other brands reliable. Sometimes one firmware works and a newer does not.
I think there is a list somewhere on the net with the routers that show this behavior.

1 Like

I’m currently forcing a panic on Zigbee in hopes that it may repair itself once it comes back online. In the meantime I totally forgot I already had a backup Zigbee dongle that I can test the firmware update on.

I’ll look for that list to see if there are any issues with what I have, thanks for the suggestion.

The Zigbee dongle is probably your coordinator.
Routers are you other mains powered devices.

I am aware, thank you for the reminder. I’ll probably turn my now second spare dongle into a router since I’ve read they make very powerful routers when the firmware is switched on them due to the large antenna.

That many other related things are covered in my guide.

That is also covered in the same guide.

1 Like

Perfect! :wink:

Well, I’m far worse off than before, lol. I updated the firmware on both of my adapters and it went with no errors, but now ZFA cannot connect to the Zigbee controller and if I reconfigure and tell it to either reconfigure the current radio or migrate (to the spare adapter) it cannot open the port:

Looking at others that have had similar issues, there’s nothing specific to HAOS that is viable, as trying to change permissions (as several recommended and said fixed their issue) is not possible.

Did you try /dev/serial/by-id/… ?

I tried both, they both come up with the same message

Which exact dongle do you have? And which exact firmware are you on?

If I look at that table it has it’s limits.

Did you use the router-only firmware or the coordinator firmware?

I have the Sonoff Dongle 3.0 USB Plus, the firmware I posted above is what it was, then I upgraded it to 20240710 according to the instructions posted here and elsewhere (that were all mostly the same) (ITead's "Sonoff Zigbee 3.0 USB Dongle Plus" (model "ZBDongle-P") based on Texas Instruments CC2652P radio SoC/MCU).

I’ve turned on debug logging and am wondering if it’s stuck in bootloader, the errors are odd - it seems to recognize it and even sees that there was a firmware change, but it errors out:

2024-08-26 11:50:49.636 DEBUG (MainThread) [homeassistant.components.zha] Failed to set up ZHA
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/zigpy_znp/api.py", line 694, in _skip_bootloader
    result = await responses.get()
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/queues.py", line 158, in get
    await getter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/__init__.py", line 152, in async_setup_entry
    zha_gateway = await ZHAGateway.async_from_config(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/zha/core/gateway.py", line 197, in async_from_config
    await instance.async_initialize()
  File "/usr/src/homeassistant/homeassistant/components/zha/core/gateway.py", line 215, in async_initialize
    await app.startup(auto_form=True)
  File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 233, in startup
    await self.connect()
  File "/usr/local/lib/python3.12/site-packages/zigpy_znp/zigbee/application.py", line 103, in connect
    await znp.connect()
  File "/usr/local/lib/python3.12/site-packages/zigpy_znp/api.py", line 736, in connect
    self.capabilities = (await self._skip_bootloader()).Capabilities
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/zigpy_znp/api.py", line 693, in _skip_bootloader
    async with async_timeout.timeout(CONNECT_PROBE_TIMEOUT):
  File "/usr/local/lib/python3.12/site-packages/async_timeout/__init__.py", line 141, in __aexit__
    self._do_exit(exc_type)
  File "/usr/local/lib/python3.12/site-packages/async_timeout/__init__.py", line 228, in _do_exit
    raise asyncio.TimeoutError
TimeoutError
2024-08-26 11:51:16.979 DEBUG (bellows.thread_0) [zigpy.serial] Opening a serial connection to '/dev/ttyUSB0' (115200 baudrate)
2024-08-26 11:51:16.986 DEBUG (MainThread) [bellows.ezsp] Resetting EZSP
2024-08-26 11:51:21.993 DEBUG (MainThread) [zigpy.application] Failed to probe with config {'path': '/dev/ttyUSB0', 'baudrate': 115200, 'flow_control': None}
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 631, in probe
    await app.connect()
  File "/usr/local/lib/python3.12/site-packages/bellows/zigbee/application.py", line 149, in connect
    await ezsp.startup_reset()
  File "/usr/local/lib/python3.12/site-packages/bellows/ezsp/__init__.py", line 125, in startup_reset
    await self.reset()
  File "/usr/local/lib/python3.12/site-packages/bellows/ezsp/__init__.py", line 151, in reset
    await self._gw.reset()
TimeoutError
2024-08-26 11:51:22.000 DEBUG (bellows.thread_0) [zigpy.serial] Opening a serial connection to '/dev/ttyUSB0' (57600 baudrate)
2024-08-26 11:51:22.004 DEBUG (MainThread) [bellows.ezsp] Resetting EZSP
2024-08-26 11:51:27.012 DEBUG (MainThread) [zigpy.application] Failed to probe with config {'path': '/dev/ttyUSB0', 'baudrate': 57600, 'flow_control': None}
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 631, in probe
    await app.connect()
  File "/usr/local/lib/python3.12/site-packages/bellows/zigbee/application.py", line 149, in connect
    await ezsp.startup_reset()
  File "/usr/local/lib/python3.12/site-packages/bellows/ezsp/__init__.py", line 125, in startup_reset
    await self.reset()
  File "/usr/local/lib/python3.12/site-packages/bellows/ezsp/__init__.py", line 151, in reset
    await self._gw.reset()
TimeoutError
2024-08-26 11:51:33.822 DEBUG (MainThread) [zigpy.application] Failed to probe with config {'path': '/dev/ttyUSB0', 'baudrate': 115200, 'flow_control': None}
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/zigpy_deconz/api.py", line 589, in _command
    return await fut
           ^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 631, in probe
    await app.connect()
  File "/usr/local/lib/python3.12/site-packages/zigpy_deconz/zigbee/application.py", line 97, in connect
    await api.connect()
  File "/usr/local/lib/python3.12/site-packages/zigpy_deconz/api.py", line 466, in connect
    await self.version()
  File "/usr/local/lib/python3.12/site-packages/zigpy_deconz/api.py", line 813, in version
    self._protocol_version = await self.read_parameter(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/zigpy_deconz/api.py", line 832, in read_parameter
    rsp = await self.send_command(
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/zigpy_deconz/api.py", line 508, in send_command
    return await self._command(cmd, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/zigpy_deconz/api.py", line 588, in _command
    async with asyncio_timeout(COMMAND_TIMEOUT):
  File "/usr/local/lib/python3.12/asyncio/timeouts.py", line 115, in __aexit__
    raise TimeoutError from exc_val
TimeoutError
2024-08-26 11:51:36.827 DEBUG (MainThread) [zigpy_zigate.api] Unsuccessful radio probe of '/dev/ttyUSB0' port
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
    return await fut
           ^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/zigpy_zigate/api.py", line 606, in _probe
    await self.set_raw_mode()
  File "/usr/local/lib/python3.12/site-packages/zigpy_zigate/api.py", line 431, in set_raw_mode
    await self.command(CommandId.SET_RAWMODE, data)
  File "/usr/local/lib/python3.12/site-packages/zigpy_zigate/api.py", line 388, in command
    done, pending = await asyncio.wait(tasks, timeout=timeout)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/tasks.py", line 464, in wait
    return await _wait(fs, timeout, return_when, loop)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/tasks.py", line 550, in _wait
    await waiter
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/zigpy_zigate/api.py", line 576, in probe
    await asyncio.wait_for(api._probe(), timeout=PROBE_TIMEOUT)
  File "/usr/local/lib/python3.12/asyncio/tasks.py", line 519, in wait_for
    async with timeouts.timeout(timeout):
  File "/usr/local/lib/python3.12/asyncio/timeouts.py", line 115, in __aexit__
    raise TimeoutError from exc_val
TimeoutError


2024-08-26 11:52:01.151 DEBUG (MainThread) [homeassistant.components.zha.repairs.wrong_silabs_firmware] Failed to probe application type
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/repairs/wrong_silabs_firmware.py", line 87, in probe_silabs_firmware_type
    await flasher.probe_app_type()
  File "/usr/local/lib/python3.12/site-packages/universal_silabs_flasher/flasher.py", line 235, in probe_app_type
    raise RuntimeError("Failed to probe running application type")
RuntimeError: Failed to probe running application type

50 direct children, but 200 Zigbee 3.0 devices if using routers. Op hardly reaches those limits.

20240710 is released already ( as in out of beta) ?

2 days ago apparently:

Fortunately my un-needed (now needed) 3rd dongle should arrive today, it will be interesting to see if that can connect without any issues.

BTW, this is why I split my network across protocols, I can live without Zigbee for a while - it’ll suck, but 2/3rds of my automation is still fine.

So, something definitely is going on with ZHA/HA and the 20240710 firmware. Now I read a couple of posts on that release where people claimed to be able to get it working but it had some issues, apparently as well. Regardless, two of my dongles flashed with that firmware would not recognize, but a brand new dongle that has the 20210708 firmware worked immediately.

I’ve learned a couple things from this somewhat frustrating experience:

  • The ZHA auto backups are fantastic, takes all the thought out of the mix for getting back up again
  • The ZHA option to “reconfigure radio” is somewhat confusing as you would assume you need to migrate to a new radio with a new dongle but you do not - you migrate to a new radio if the old dongle is still there and operational, otherwise you reconfigure
  • Updating firmware is risky business (yes, I knew this already) but just because it’s a major company doesn’t exclude it from potentially ruining your day

Now, will/does it fix the problem that I originally posted about? That remains to be seen - my gut says no but there’s part of me that also says “well, it is a factory device and the only similarity is the NVRAM was restore so maybe.”

Of course, the original concept of forcing Zigbee into panic mode may still pan out, hopefully the next 24 hours will tell. In the meantime I’m back to where I started at least and have Zigbee back to where it was before.

2 Likes