Issue with ZHA constantly initializing and failing

Im having the same issue :((( Anyone found a fix ?
SONOFF Zigbee 3.0 USB Dongle Plus V2

migrate to a new radio: error
re-configure the current radio: error creating network

2024-01-10 14:45:34.990 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2024-01-10 14:46:19.772 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback ThreadsafeProxy.__getattr__.<locals>.func_wrapper.<locals>.check_result_wrapper() at /usr/local/lib/python3.11/site-packages/bellows/thread.py:110
IndexError: index out of range
2024-01-10 14:46:20.442 ERROR (MainThread) [homeassistant.components.homeassistant_alerts] Timeout fetching homeassistant_alerts data
2024-01-10 14:46:50.690 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback ThreadsafeProxy.__getattr__.<locals>.func_wrapper.<locals>.check_result_wrapper() at /usr/local/lib/python3.11/site-packages/bellows/thread.py:110
IndexError: index out of range
2024-01-10 14:47:48.119 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2024-01-10 14:47:52.392 ERROR (bellows.thread_0) [bellows.uart] Lost serial connection: SerialException('device reports readiness to read but returned no data (device disconnected or multiple access on port?)')
2024-01-10 14:49:50.181 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2024-01-10 14:49:52.628 ERROR (bellows.thread_0) [bellows.uart] Lost serial connection: SerialException('device reports readiness to read but returned no data (device disconnected or multiple access on port?)')
2024-01-10 14:51:49.991 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2024-01-10 14:55:19.520 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2024-01-10 14:56:53.990 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback SerialTransport._read_ready()
IndexError: index out of range
2024-01-10 15:02:05.178 ERROR (bellows.thread_0) [bellows.uart] Lost serial connection: SerialException('device reports readiness to read but returned no data (device disconnected or multiple access on port?)')
2024-01-10 15:04:03.147 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback ThreadsafeProxy.__getattr__.<locals>.func_wrapper.<locals>.check_result_wrapper() at /usr/local/lib/python3.11/site-packages/bellows/thread.py:110
IndexError: index out of range
2024-01-10 15:06:00.501 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2024-01-10 15:08:01.628 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2024-01-10 15:09:29.106 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2024-01-10 15:14:33.234 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2024-01-10 15:19:32.089 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2024-01-10 15:21:42.024 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback ThreadsafeProxy.__getattr__.<locals>.func_wrapper.<locals>.check_result_wrapper() at /usr/local/lib/python3.11/site-packages/bellows/thread.py:110
IndexError: index out of range
2024-01-10 15:22:54.798 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2024-01-10 15:22:59.425 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: EzspError('EZSP is not running')
2024-01-10 15:23:09.427 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: EzspError('EZSP is not running')
2024-01-10 15:23:19.428 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: EzspError('EZSP is not running')
2024-01-10 15:23:29.431 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: EzspError('EZSP is not running')
2024-01-10 15:23:39.434 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: EzspError('EZSP is not running')
    raise EzspError("EZSP is not running")
bellows.exception.EzspError: EZSP is not running
2024-01-10 15:25:06.076 ERROR (Recorder) [homeassistant] Error doing job: Task exception was never retrieved
asyncio.exceptions.CancelledError
    raise TimeoutError from exc_val
TimeoutError
2024-01-11 10:21:29.227 ERROR (MainThread) [aiohttp.server] Error handling request
2024-01-11 10:21:45.678 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received getValue: [<EzspStatus.ERROR_INVALID_ID: 55>, b'']
2024-01-11 10:21:54.228 ERROR (MainThread) [aiohttp.server] Error handling request
asyncio.exceptions.CancelledError
    raise TimeoutError from exc_val
TimeoutError
TimeoutError
asyncio.exceptions.CancelledError
    raise TimeoutError from exc_val
TimeoutError
asyncio.exceptions.CancelledError
    raise exceptions.TimeoutError() from exc
TimeoutError
    raise RuntimeError("Failed to probe running application type")
RuntimeError: Failed to probe running application type
2024-01-11 10:29:10.993 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received getValue: [<EzspStatus.ERROR_INVALID_ID: 55>, b'']
2024-01-11 10:29:12.582 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received getValue: [<EzspStatus.ERROR_INVALID_ID: 55>, b'']
2024-01-11 10:29:44.844 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received getValue: [<EzspStatus.ERROR_INVALID_ID: 55>, b'']
2024-01-11 10:29:50.977 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart

Yes! Downgrade to:

I am on 2023.12.04 and ZHA. While I have had my share of ZB issues over the past few months, this build seems to be fine at the moment. I have often considered switching from ZHA to Z2M, but I have a lot of devices and resetting them all to work with Z2M just isn’t going to happen unless things really get disastrous. It would take a few days to do even if everything goes smoothly, which I am certain it won’t. Besides, I just spent about a week moving all my temp/humidity sensors from MQTT to ESPHome. Ugh! I am not a fan of MQTT. While it’s useful, it can be a royal PITA sometimes, just like ZHA.

With that said, Z2M does seem to have better support and faster fixes/updates than ZHA.

My system can’t start Zigbee ZHA either. I downgraded, but it seemed to re-auto-apply, so broke again. I’ve now disabled the Update function on the Zigbee controller.

Seems my update disablement didn’t work. An update installed yesterday, and again it’s failing. Found this error in the log:

Logger: homeassistant.helpers.dispatcher
Source: helpers/dispatcher.py:59
First occurred: January 14, 2024 at 9:12:50 AM (184 occurrences)
Last logged: January 15, 2024 at 1:16:32 PM

Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0xffff6513bd10>>

I’ve been reverting to this as a fix: addon_core_silabs_multiprotocol_2.3.2

Sorry, just another update. To be clear, it seems that my failure for ZHA is due to the silicon labs multiprotocol add-on. There is a known bug, which I found chatter about over here, just FYI. I’ve now gone in and skipped the current update. fingers crossed!

I tried both FW options (Zigbee and multi) but still had the issue. Hopefully it’ll be fixed soon.

Hi all, New to HA as well.

Faced the same issue as the OP and kept getting a ton of errors in the logs after the update in mid December 2023. ZHA would initialize but there was a tremendous lag between action and response. e,g, if i turn on a switch its would respond after like 2 mins. Even reinstalled the entire OS to eliminate issue with any add-ons but the issue started again. Reverted back to the last backup I had - 2023.12.1 and it is not flooding the logs with errors and everything functions correctly.

Hope there is a fix soon.

Also, wonder how to test any new updates as it just is a cumbersome process going back to the working version (incase the new update does not have the issues resolved) as I have to re-add all the switches.

Update: new version 2.4.4 still broken for me. :frowning:

1 Like

Yes, the Multiprotocol update in Jan broke something. My zigbee Devices randomly dropout. A restart is required to fix.

2024-02-24 14:37:46.534 ERROR (MainThread) [bellows.uart] Lost serial connection: ConnectionResetError(‘Remote server closed connection’)

2024-02-24 14:37:46.536 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart

2024-02-24 14:37:46.550 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f72dd9250>>

I am having this issue too. I just expected it to resolve itself, but a few months on I’m less sure. Have you found a fix?

I’m having the same issue with Sonoff Bridge running Tasmota FW. restarting HA make it work for ~24H but it ultimately fails.

How did you solve the problem? I have a similar issue.

Same issues here :frowning:
Any new informations?

I’m having this exact issue too now. What kind of corner cases are we?

Same issues for the pass months. Seems to crash very 3 or 4 days. Was more often, but I increase the memory on my vm (proxmox). No memory errors going on when the problem is happen (now), so I doubt this is the issue. I never really had a good looks logs, as I usually have to get things working ASAP (whats the expression the wife factor). I’ve been avoiding MQTT (why use it if you don’t need to?) but I might have to use this path.

It seems now, I need to reboot my vm to get zha work, it usually was just enough to restart within HA.

For me the problem went away once I disconnected (physically) 2 devices for which I could so increasing number of open sockets in netstat -an output. May want to start with checking with netstat first.

Aplogies, I don’t understant, what to look for int rhe outut of netstat

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.11:33445 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:8099 0.0.0.0:* LISTEN
tcp 0 0 172.30.33.5:8099 172.30.32.2:56766 ESTABLISHED
udp 0 0 127.0.0.11:43476 0.0.0.0:*
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node Path
unix 2 [ ACC ] STREAM LISTENING 11897 s
unix 3 STREAM CONNECTED 1727971 /tmp/tmux-0/default
unix 3 STREAM CONNECTED 1727418
unix 2 [ ACC ] STREAM LISTENING 1666277 /tmp/tmux-0/default

What does this tell me?

In my last failure, I notice an error message when I hovered over the ! (could not find any messages in the log) that said “No usable address”. Searching on this I found

This seems to my problem, I tried this solution, and is working so far, a few hours, so does mean much yet. I still might switch to Z2M, I just picked up an POE zigbee SLZB-06 to start the change over.

apologies, for me the problem was actually another integration (wiz) that had affected ZHA. it looks like you’re facing a different issue.