ZHA component sometimes falls off and devices become unavailable for no apparent reason

I have a problem with ZHA component which has unpredictible behaviour. For no apparent reason it falls off and becomes unavailable and I have to reinstall it almost every week. I am using:
NUC i7 with:
Core
Version
core-2021.1.4
Supervisor
Version
supervisor-2021.01.5
Operating System
Home Assistant OS 5.10

Below is what I found in the logs. Is there any way to stop this thing from happening?

Logger: homeassistant.components.zha.core.gateway
Source: components/zha/core/gateway.py:157
Integration: Zigbee Home Automation (documentation, issues)
First occurred: 2:15:40 (1 occurrences)
Last logged: 2:15:40

Couldn’t start EZSP = Silicon Labs EmberZNet protocol: Elelabs, HUSBZB-1, Telegesis coordinator
Traceback (most recent call last):
File “/usr/src/homeassistant/homeassistant/components/zha/core/gateway.py”, line 157, in async_initialize
self.application_controller = await app_controller_cls.new(
File “/usr/local/lib/python3.8/site-packages/zigpy/application.py”, line 69, in new
await app.startup(auto_form)
File “/usr/local/lib/python3.8/site-packages/bellows/zigbee/application.py”, line 108, in startup
self._ezsp = await bellows.ezsp.EZSP.initialize(self.config)
File “/usr/local/lib/python3.8/site-packages/bellows/ezsp/ init .py”, line 81, in initialize
await ezsp._protocol.initialize(zigpy_config)
File “/usr/local/lib/python3.8/site-packages/bellows/ezsp/protocol.py”, line 67, in initialize
await self._cfg(self.types.EzspConfigId[config], value)
File “/usr/local/lib/python3.8/site-packages/bellows/ezsp/protocol.py”, line 35, in _cfg
(status,) = await self.setConfigurationValue(config_id, value)
File “/usr/local/lib/python3.8/asyncio/tasks.py”, line 501, in wait_for
raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError

I’m having similar issues. Have you tried unplugging and replugging in your zig bee USB? That works for me. I have to do it after every reboot though (as of late).

Hi all, I am having also problems since I migrated to 2021.2.1
my configuration is:
Version: core-2021.2.3
Docker 20.10.0
OS Ubuntu 20.04.1 LTS

the log shows repeated errors like this:

021-02-23 23:06:01 WARNING (bellows.thread_0) [bellows.uart] Reset future is None
2021-02-23 23:06:02 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: 
2021-02-23 23:06:03 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback ThreadsafeProxy.__getattr__.<locals>.func_wrapper.<locals>.check_result_wrapper() at /usr/local/lib/python3.8/site-packages/bellows/thread.py:97
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.8/site-packages/bellows/thread.py", line 98, in check_result_wrapper
    result = call()
  File "/usr/local/lib/python3.8/site-packages/bellows/ezsp/__init__.py", line 254, in frame_received
    self._protocol(data)
  File "/usr/local/lib/python3.8/site-packages/bellows/ezsp/protocol.py", line 102, in __call__
    frame_name = self.COMMANDS_BY_ID[frame_id][0]
KeyError: 520
2021-02-23 23:06:03 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback ThreadsafeProxy.__getattr__.<locals>.func_wrapper.<locals>.check_result_wrapper() at /usr/local/lib/python3.8/site-packages/bellows/thread.py:97
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.8/site-packages/bellows/thread.py", line 98, in check_result_wrapper
    result = call()
  File "/usr/local/lib/python3.8/site-packages/bellows/ezsp/__init__.py", line 254, in frame_received
    self._protocol(data)
  File "/usr/local/lib/python3.8/site-packages/bellows/ezsp/protocol.py", line 102, in __call__
    frame_name = self.COMMANDS_BY_ID[frame_id][0]
KeyError: 520
2021-02-23 23:06:12 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: EZSP is not running
2021-02-23 23:06:20 ERROR (bellows.thread_0) [bellows.uart] Lost serial connection: read failed: socket disconnected
2021-02-23 23:06:20 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2021-02-23 23:06:22 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: EZSP is not running
2021-02-23 23:06:23 WARNING (bellows.thread_0) [bellows.uart] Reset future is None
2021-02-23 23:06:25 WARNING (bellows.thread_0) [bellows.uart] Reset future is None
2021-02-23 23:06:26 INFO (SyncWorker_0) [wiserHeatingAPI.wiserHub] Updating Wiser Hub Data
2021-02-23 23:06:32 ERROR (bellows.thread_0) [bellows.uart] Lost serial connection: read failed: socket disconnected
2021-02-23 23:06:32 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2021-02-23 23:06:32 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: EZSP is not running
2021-02-23 23:06:33 WARNING (bellows.thread_0) [bellows.uart] Reset future is None
2021-02-23 23:06:42 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: EZSP is not running
2021-02-23 23:06:44 ERROR (bellows.thread_0) [bellows.uart] Lost serial connection: read failed: socket disconnected
2021-02-23 23:06:44 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2021-02-23 23:06:45 WARNING (bellows.thread_0) [bellows.uart] Reset future is None
2021-02-23 23:06:45 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback ThreadsafeProxy.__getattr__.<locals>.func_wrapper.<locals>.check_result_wrapper() at /usr/local/lib/python3.8/site-packages/bellows/thread.py:97
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.8/site-packages/bellows/thread.py", line 98, in check_result_wrapper
    result = call()
  File "/usr/local/lib/python3.8/site-packages/bellows/ezsp/__init__.py", line 254, in frame_received
    self._protocol(data)
  File "/usr/local/lib/python3.8/site-packages/bellows/ezsp/protocol.py", line 102, in __call__
    frame_name = self.COMMANDS_BY_ID[frame_id][0]
KeyError: 520
2021-02-23 23:06:49 ERROR (MainThread) [custom_components.wiser.const] Unable to update from Wiser hub due to unknown error
2021-02-23 23:07:07 INFO (MainThread) [bellows.zigbee.application] EZSP Radio manufacturer: 
2021-02-23 23:07:07 INFO (MainThread) [bellows.zigbee.application] EZSP Radio board name: 
2021-02-23 23:07:07 INFO (MainThread) [bellows.zigbee.application] EmberZNet version: 6.7.6.0 build 327
2021-02-23 23:07:13 INFO (MainThread) [bellows.zigbee.application] Node type: EmberNodeType.COORDINATOR, Network parameters: EmberNetworkParameters(extendedPanId=cc:cc:cc:cc:dc:12:cf:a4, panId=0x0fa4, radioTxPower=20, radioChannel=11, joinMethod=<EmberJoinMethod.USE_MAC_ASSOCIATION: 0>, nwkManagerId=0x0000, nwkUpdateId=0, channels=<Channels.ALL_CHANNELS: 134215680>)
2021-02-23 23:07:17 INFO (MainThread) [zigpy.application] Device 0x0000 (60:a4:23:ff:fe:1d:47:91) joined the network

after a few hours the errors become more frequent and the devices start to be un-responsive or they disconnect. restarting HA core will restore the situation to normal.
i have now to restart once or twice a day.
can anybody help?
thank you

Known issue with Zigbee to WiFi based bridges/gateways (Zigbee-to-Serial proxy-servers/services):

https://github.com/zigpy/bellows#warning-about-zigbee-to-wifi-bridges

The same warning is also hidden here:

https://www.home-assistant.io/integrations/zha/

This is why WiFi based Zigbee-to-Serial proxy-servers/services are not recommended for production.

Options are to buy a USB-dongle/stick or Ethernet (wired) bridge/gateway or much more stable WiFi.

Feedback wanted in home-assistant.io/issues/17170 if you think Sonoff ZBBridge should be lísted in ZHA integration documentation as in the top of the list of “known working radio modules” or not: