ZigBee (ZHA) becoming unstable every 2-3 weeks

Hi,

Firstly I’d like to say thank you to the Nabu Casa team and the Home Assistant community for making such excellent software. Home Assistant has changed my smart home life and I absolutely love it.

I’ve been slowly building up my ZigBee network over the last 6 months and been trying to run my server for longer and longer periods without intervention. I’ve noticed it a couple of times now, but after a couple of weeks I’ll wake up to all my ZigBee devices not working and I have to restart the ZHA integration or Home Assistant to get it to work.

I get a couple of errors - once I got ZIGBEE_DELIVERY_FAILED appear in Home Assistant and another time I got ZHAException: Failed to send request: device did not respond error. I only have the logs for the latter one.

I am wondering if perhaps I have too many routers, as I have quite a small house, and I have about 9 devices that are acting as Zigbee Routers. Before I didn’t have many routers - I didn’t have devices drop out but in the visualisation I had a lot of grey lines so I thought I’d add more devices to help.

Perhaps I just need to trigger a network heal? I’ve never manually done this - I’ve just added more and more devices over time. It doesn’t look very optimized for example my kitchen light is connecting to my lamp in my bedroom - it really should connect to the washing machine or repeater plug which is much closer.

This is what my network looks like. I have the Coordinator at the very front of the house where my internet comes in and my server is, and I have to essentially try and work my way to the back of the house with routers.

Logs

Logger: zigpy.zcl
Source: runner.py:154
First occurred: 11 March 2025 at 05:00:05 (2 occurrences)
Last logged: 11 March 2025 at 05:58:10

[0x15A9:1:0x0020] Traceback (most recent call last): File "/usr/local/lib/python3.13/site-packages/bellows/zigbee/application.py", line 934, in send_packet send_status, _ = await req.result ^^^^^^^^^^^^^^^^ asyncio.exceptions.CancelledError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.13/site-packages/zigpy/device.py", line 378, in request await send_request() File "/usr/local/lib/python3.13/site-packages/zigpy/application.py", line 835, in request await self.send_packet( ...<14 lines>... ) File "/usr/local/lib/python3.13/site-packages/bellows/zigbee/application.py", line 933, in send_packet async with asyncio_timeout(APS_ACK_TIMEOUT): ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.13/asyncio/timeouts.py", line 116, in __aexit__ raise TimeoutError from exc_val TimeoutError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.13/site-packages/zha/zigbee/cluster_handlers/general.py", line 630, in check_in_response await self.checkin_response(True, self.CHECKIN_FAST_POLL_TIMEOUT, tsn=tsn) File "/usr/local/lib/python3.13/site-packages/zha/zigbee/cluster_handlers/__init__.py", line 85, in wrapper with wrap_zigpy_exceptions(): ~~~~~~~~~~~~~~~~~~~~~^^ File "/usr/local/lib/python3.13/contextlib.py", line 162, in __exit__ self.gen.throw(value) ~~~~~~~~~~~~~~^^^^^^^ File "/usr/local/lib/python3.13/site-packages/zha/zigbee/cluster_handlers/__init__.py", line 70, in wrap_zigpy_exceptions raise ZHAException("Failed to send request: device did not respond") from exc zha.exceptions.ZHAException: Failed to send request: device did not respond

Hi @Sammyjo20

Have you tried restarting your HA instance, say, every 7 days, to see if the problem goes away?

Whilst, in principle, HA should be able to run indefinately, there are many things that may cause problems to build up over time.

Personally, I have an automation that restarts HA at 2 am every day.

1 Like

You can’t have too many routers. :grin:

Actually, small houses are often more problematic for Zigbee because there are more obstructing walls - I’d say your approach was exactly right. I have a small Victorian house (tiny rooms, thick walls) and it took about 30 routers to make the network stable.

1 Like

Yes that’s exactly what I thought about doing - I was looking for the right command to restart it - would you be able to share me your template please so I can have a look? Or do you reboot HA entirely?

Haha good to know!

Yeah the walls here are thick brick walls so it doesn’t travel far.

Maybe I need to look elsewhere, I’m using IKEA plugs as the routers (and one bulb)

Do you think it’s worth me getting the devices to rebuild the network?

You can get little USB routers - manufacturers sometimes market them as “extenders” - not very expensive and don’t take up much space. Two or three of those might make a difference.

Beware of these, at least the Tuya ones. There are reports here that they were black-holing messages. Basically, they were doing the opposite of what they were supposed to do.
Not sure if this was fixed in future revisions, but the only dedicated USB repeaters which have been reported as reliable are either the Ikea ones or a zigbee coordinator flashed with router firmware.

1 Like

Hi @Sammyjo20

To restart HA I use

action: homeassistant.restart
data: {}

However, my actual automation is more complex. On Sundays it restarts the host. It will also install certain updates.

1 Like

Thank you for the replies everyone. So far I turned off the Home Assistant container for at least an hour to let all the Zigbee devices go into panic mode, I’ve since turned it back on and everything seems to be working still - hopefully it’s healing and optimizing itself.

I’ve also setup an automation which restarts HA at 3am every Monday which should also fix any issues where buffers (or something) gets filled up and maybe will stop the “max messages” errors I was getting.

Next time it happens I’ll also update this post with any further logs.

Network heal is only for Zwave. Zigbee doesn’t have such a concept unless you physically unpair and re-pair the devices. However, for your kitchen light example, you could try to unpair it, then go on the device page of your washing machine and select the “Add devices via this device” in the 3 dots menu next to Reconfigure:

I’m guessing this means there’s a wifi access point right on top of it, right?
If so, the best way to deal with this is to plug your skyconnect into a USB2 extension cable to avoid interference from both the wifi as well as the server itself.
More details about this here Zigbee network optimization: a how-to guide for avoiding radio frequency interference + adding Zigbee Router devices (repeaters/extenders) to get a stable Zigbee network mesh with best possible range and coverage by fully utilizing Zigbee mesh networking

1 Like