I’m using the Zigbee2MQTT addon with a Sonoff Zigbee Bridge Pro (Tasmota flashed) for Zigbee devices. The Zigbee bridge is a bit janky as well as the “serial-over-TCP” hack to connect it over Wifi, so once in a while it craps out and either loses connection to all Zigbee devices or completely hangs until a reset of the Zigbee chip. Zigbee2MQTT has a watchdog function but it’s just waiting forever until I manually reset the bridge.
Interestingly, what actually hangs is the Zigbee chip and not the Tasmota firmware running on the ESP32, so I can still remotely connect to the web interface and manually restart it:
As a side note, the “Restart” button only restarts the ESP32 firmware but “by design” it doesn’t reset the Zigbee chip, so a restart that way won’t do anything (unless I physically power it off and on again). However, the reset pin of the Zigbee chip is routed to an I/O pin of the ESP32, so it can be manually triggered to reset - that’s what the “Toggle” button does (why this is not part of the “restart” routine is beyond me, but that’s a separate issue).
Anyhow, it’s a simple web interface and the reset can be performed through some GET URL commands (toggle off, wait some seconds, toggle on). What I need help with is the trigger condition, i.e. detecting when Zigbee2MQTT is stuck (can’t connect to the bridge) or can’t ping any of the devices. Additionally, when the automation triggers, I also want to force-restart Zigbee2MQTT some seconds after issuing the Zigbee chip reset rather than waiting for the watchdog timer (which can be up to 5min), and then query whether it has successfully connected to the Zigbee bridge (otherwise repeat the process after some timeout). Is there an interface to query the Zigbee2MQTT addon state in order to achieve that?
Z2M has an entity named “running” which is disabled by default in HA. Inside HA (not Z2M) go to Settings>Devices & services>devices tab & search for Zigbee2Mqtt.
Click it, scroll down to the sensors section & click “+6 disabled entities”.
Enable the “running” entity & save - you’ll be able to use it as a trigger in your automation assuming your Z2M instance actually stops running when the issue hits you
Thanks, I’ll test it. When the Zigbee bridge chip dies, it usually causes some communication error in Z2MQTT (don’t remember the specific error code, but it’s an actual error in the logs) and that forces the watchdog to restart it. I’m not sure how the “running” entity will react to it, presumably it will always be “running” while waiting for timeouts except for a very brief moment when the watchdog restarts the addon. I also don’t know how reliably HA registers a very short (maybe <1s) change in the running state, but I guess I could turn off the built-in addon watchdog and only use the custom automation.
The other (much rarer) error state is when Z2MQTT keeps running but periodically receives a bunch of warnings that Zigbee devices couldn’t be pinged, so the “running” entity won’t help with that. I guess I could query the online state of some Zigbee devices that are definitely always online except when the bridge dies - or is there a more elegant way?
You don’t necessarily have to switch off the watchdog. By default, the watchdog delays are every 1, 5, 15, 30 & 60 minutes, so not sure why you think it’s triggering after 1 second unless you changed that via customised delays for the watchdog:
If you want a custom automation, you can call an addon start/restart by using the below in your action:
action: hassio.addon_restart
data:
addon: 45df7312_zigbee2mqtt
Good to know. I was somehow under the impression that the watchdog acts “immediately” (within seconds) at first and then ramps up the time to minutes. But if it waits for “1 minute of not running” at least, then it shouldn’t ever interfere with the custom automation to restart the Zigbee bridge.
1 Like