Z2M addon hangs and needs restart randomly few times a day

hi all,

I had 2 Z2M instances working with 2 zigbee Tasmota Sonoff ethernet based coordinators for a while
this is the coordinator

https://a.aliexpress.com/_ooK11yi

After several config issues on install, which the good people here helped me solved, it was running stable for few months
these are the past issues i had

  1. Tasmota Sonoff ethernet based gateway configuration
  2. Issues with Z2M and Zigbee ethernet coordinator

in last few weeks, one of the Z2M instances (I call it Z2M2) started to hang randomly
My other Z2M instance (i call it Z2M1) is stable and its same device exactly same f/w etc.

This Z2M2 instance has like total of 19 devices, mostly Tuya - temp sensors, motion/presence sensors, few smart plugs, contact / light sensors.
Z2M1 has fewer devices (5) Also mostly Tuya

I am updated to latest HA, and both Z2M addon instances was recently updated as well (1.40.1-1) though im not sure its related to when the problem started

I didnt touch them physically nor did I add any new zigbee sensor recently.

My Coordinators have the this f/w, which is latest AFAIK.

The addon for Z2M2 instance would just stop randomly. this happens now like 3-6 times a day.
although a watchdog auto restart is set for it, it does not restart (also weird)

i have an automation which detects it and starts it, so my HA keeps working, but its annoying and does seem right operation mode.

this is the log message of the Z2M2 addon i have when it stops

[2024-09-18 00:01:23] error: 	zh:ember:uart:ash: Received ERROR from adapter, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-09-18 00:01:23] error: 	zh:ember:uart:ash: ASH disconnected | Adapter status: ASH_NCP_FATAL_ERROR
[2024-09-18 00:01:23] error: 	zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-09-18 00:01:23] error: 	zh:ember: Adapter fatal error: HOST_FATAL_ERROR
[2024-09-18 00:01:23] debug: 	zh:controller: Adapter disconnected
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash: ASH COUNTERS since last clear:
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   Total frames: RX=293, TX=286
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   Cancelled   : RX=0, TX=0
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   DATA frames : RX=275, TX=103
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   DATA bytes  : RX=6322, TX=2090
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   Retry frames: RX=16, TX=0
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   ACK frames  : RX=0, TX=182
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   NAK frames  : RX=0, TX=0
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   nRdy frames : RX=0, TX=0
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   CRC errors      : RX=0
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   Comm errors     : RX=0
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   Length < minimum: RX=0
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   Length > maximum: RX=0
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   Bad controls    : RX=0
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   Bad lengths     : RX=0
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   Bad ACK numbers : RX=0
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   Out of buffers  : RX=0
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   Retry dupes     : RX=16
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   Out of sequence : RX=0
[2024-09-18 00:01:23] info: 	zh:ember:uart:ash:   ACK timeouts    : RX=0
[2024-09-18 00:01:24] error: 	zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-09-18 00:01:24] error: 	zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-09-18 00:01:24] info: 	zh:ember:uart:ash: ======== ASH stopped ========
[2024-09-18 00:01:24] info: 	zh:ember:ezsp: ======== EZSP stopped ========
[2024-09-18 00:01:24] info: 	zh:ember: ======== Ember Adapter Stopped ========
[2024-09-18 00:01:24] error: 	z2m: Adapter disconnected, stopping
[2024-09-18 00:01:24] debug: 	z2m: Saving state to file /config/zigbee2mqtt2/state.json
[2024-09-18 00:01:24] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2mqtt2/bridge/state', payload '{"state":"offline"}'
[2024-09-18 00:01:24] info: 	z2m: Disconnecting from MQTT server
[2024-09-18 00:01:24] info: 	z2m: Stopping zigbee-herdsman...
[2024-09-18 00:01:24] debug: 	zh:controller:database: Writing database to '/config/zigbee2mqtt2/database.db'
[2024-09-18 00:01:24] info: 	z2m: Stopped zigbee-herdsman
[2024-09-18 00:01:24] info: 	z2m: Stopped Zigbee2MQTT

after above log the addon just stops. and needs to restarted manually or by my automation

im not sure what the RC for this

the LAN network is stable AFAIK

any idea how to debug this?

So to update:
Im not sure of the RC of the problem before and why Z2M hangs
but i replaced the network switch connected to the Z2M coordinator with a new one (5 ports DL-link 1000/100/10 , the previous one was 100/10)
and since then the problem didnt occurred for last 2 weeks
Ill keep monitoring it

1 Like