Z-Wave JS to MQTT: All nodes stuck in Status: Unknown and Interview for ProtocolInfo never finishes

MasterAeon · July 13, 2021, 10:50am

For the third time over a few months time, after a power loss, all of my “Z-Wave JS to MQTT”-nodes are unresponsive and ends up in a Status: Unknown. The Interview for ProtocolInfo is stuck with the rotating circle. This means that no Z-Wave nodes are working.

I´ve tried rebooting HA, also with a cold start, nothing helps.

Last time it happened a month ago, I did a Partial snapshot restore of the “Z-Wave JS to MQTT” which brought everything back alive. This time it doesn´t help. Can anyone please help or point me towards a solution?

This is from the system log after a partial restore:

21-07-13 12:35:08 INFO (MainThread) [supervisor.snapshots] Found 5 snapshot files
21-07-13 12:37:15 INFO (MainThread) [supervisor.snapshots] Partial-Restore 89eebaa6 start
21-07-13 12:38:15 INFO (MainThread) [supervisor.jobs] 'SnapshotManager.do_restore_partial' blocked from execution, system is not running - CoreState.FREEZE
21-07-13 12:39:54 INFO (MainThread) [supervisor.snapshots] Restoring 89eebaa6 Docker Config
21-07-13 12:39:54 INFO (MainThread) [supervisor.snapshots] Restoring 89eebaa6 Repositories
21-07-13 12:39:55 INFO (MainThread) [supervisor.store] Loading add-ons from store: 65 all - 0 new - 0 remove
21-07-13 12:39:55 INFO (MainThread) [supervisor.snapshots] Restoring 89eebaa6 old add-ons
21-07-13 12:39:59 INFO (MainThread) [supervisor.addons.addon] Restore config for addon a0d7b954_zwavejs2mqtt
21-07-13 12:39:59 INFO (SyncWorker_0) [supervisor.docker.interface] Stopping addon_a0d7b954_zwavejs2mqtt application
21-07-13 12:40:29 INFO (SyncWorker_0) [supervisor.docker.interface] Cleaning addon_a0d7b954_zwavejs2mqtt application
21-07-13 12:40:29 INFO (MainThread) [supervisor.addons.addon] Restoring data for addon a0d7b954_zwavejs2mqtt
21-07-13 12:40:36 INFO (SyncWorker_6) [supervisor.docker.addon] Starting Docker add-on ghcr.io/hassio-addons/zwavejs2mqtt/aarch64 with version 0.22.0
21-07-13 12:40:36 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
21-07-13 12:40:36 WARNING (MainThread) [supervisor.ingress] Fails Ingress panel for a0d7b954_zwavejs2mqtt with 500
21-07-13 12:40:37 INFO (MainThread) [supervisor.snapshots] Partial-Restore 89eebaa6 done
21-07-13 12:41:23 ERROR (MainThread) [supervisor.api.ingress] Ingress error: 400, message='Invalid response status', url=URL('http://172.30.33.1:8099/socket.io/?EIO=4&transport=websocket&sid=exL-ILsyC41EcWj1AAAA')
21-07-13 12:47:35 ERROR (MainThread) [supervisor.api.ingress] Ingress error: 400, message='Invalid response status', url=URL('http://172.30.33.1:8099/socket.io/?EIO=4&transport=websocket&sid=CWm5t5IdGeyX0IPTAAAC')

I dont know if the CoreState.FREEZE or the Ingress Warning/Errors is related.

I get plenty of these in the Z-Wave JS log:

2021-07-13 12:55:54.110 INFO ZWAVE: Node 76: value updated: 50-0-value-66049 0 => 0
2021-07-13 12:55:55.844 INFO ZWAVE: Node 76: value updated: 50-0-value-66049 0 => 0
2021-07-13 12:55:56.725 INFO ZWAVE: Node 76: value updated: 50-0-value-66049 0 => 0
2021-07-13 12:55:56.912 INFO ZWAVE: Node 76: value updated: 50-0-value-66049 0 => 0
2021-07-13 12:55:56.925 INFO ZWAVE: Node 76: value updated: 50-0-value-66049 0 => 0
2021-07-13 12:56:03.646 INFO APP: GET /health/zwave 200 3.966 ms - 1875
2021-07-13 12:56:13.098 INFO ZWAVE: Node 71: value updated: 50-0-value-66049 0 => 0
2021-07-13 12:56:13.138 INFO ZWAVE: Node 71: value updated: 50-0-value-66049 0 => 0
2021-07-13 12:56:13.163 INFO ZWAVE: Node 71: value updated: 50-0-value-66049 0 => 0
2021-07-13 12:56:14.073 INFO ZWAVE: Node 71: value updated: 50-0-value-66049 0 => 0

However, the node 71 and 76 is also Unknown and doesn´t work together with all the other nodes.

petro · July 13, 2021, 11:37am

I’ve seen issue with battery sensors holding up the entire line. Wake the Water Sensor 6 and see what happens.

Prodigyplace · July 13, 2021, 12:56pm

I would have expected it to be multi-threaded

petro · July 13, 2021, 12:58pm

I believe it is, I’ve just seen specific battery sensors holding up the whole line. It could be exceptions. The problem is, people who’ve had this issue don’t have the trouble shooting skills to help the devs track the issue down.

Prodigyplace · July 13, 2021, 12:59pm

I am currently still in “testing mode” because of busyness in real life.

mwolter · July 13, 2021, 1:00pm

I’ve seen this be something related to the transmit queue and messages flooding the z stick. Only way to clear it is to unplug the zwave gateway while the machine is powered on and reconnect it.

mwolter · July 13, 2021, 1:19pm

Please see the linked GitHub issue. Relayed this issue to them months ago and what I did to fix it. Appears to be the same issue as the OP.

MasterAeon · July 13, 2021, 3:27pm

I´m currently not home so I´m not able to wake up the leak sensor right now, but will do it tomorrow.

Mwolter, I´m not that skilled in HA so could I ask you to use a few minutes to give me a step-by-step guide on how to fix it?

petro · July 13, 2021, 3:38pm

Click on the link he provided, expand the comment he linked to. It has a step by step, you just have to read it

MasterAeon · July 14, 2021, 10:30am

The following worked and solved the problem, as mwolter said - when I read the “hidden” info as the ever-present petro said :

As a last-ditch effort, unplugged the Z-Stick while online and plugged it in again, then rebooted. Nodes then started responding.

Thank you very much, guys! This made me feel the sun shine again.

nmsousa76 · August 18, 2021, 11:42pm

I had a similar issue in my network for weeks. (both in zwaveJS and zwavejs2mqtt). Reboot would sometimes solve the issue, but not always. All devices became unresponsive and in most cases I was able to solve the issue for a few days, by excluding one of the nodes, re-including it, then re interviewing one node at a time starting with zwave nodes (closer to the device then outwards) then doing all the zwave plus ones.

After 3 weeks of troubleshooting, I found the one of my devices was generating tons of energy updates. The minute I unplugged the device, the network started responding as expected. Plugged it back in and sure enough the network became unresponsive again. Tried excluding it, factory reseting it and readding it and same issue resurfaced. In my case this happen to be a a zooz zen15. I am leaving it off my network until I get the latest firmware from Zooz.

I recommend looking at the logs and determine which devices is generating tons of events and either unplugging them (if possible) or excluding them from your network. I know this is not an ideal scenario, but at least the network is stable until you can upgrade the firmware.