Z-Wave nodes frequently going dead in automations

I recently added 15 ZEN76 800LR switches to my Z-Wave network, for a total of 36 nodes.
When I just use one switch at a time, things seem to work relatively OK.
But when an automation needs to turn more than one switch on/off, things start going wild.
In a 3-switch automation, sometimes there are delays of a few seconds between, say, the first & second switch, or second and third switch. Sometimes one of the switches just doesn’t turn on/off even after waiting a full minute.

Here is an extreme example in a video where I try to turn 13 switches on all at once. It takes a few minutes, and 3 nodes go dead within that process.

I’m somewhat at a loss as to what to do in this case. Do I have too many devices ? Or the opposite, do I need more repeaters ? I have tried healing the network, to no avail. Any help would be appreciated. All my devices except one are Zooz, so I’ll definitely be asking them too. But I figured the HA community might be of help also.

Here is the network graph for Z-Wave JS UI :

Here is the full device list :



(sorry this last page has just 2 extra nodes)

Hi,
at first check the firmware of the controller and try to heal your network 2 or 3 times
Then you have a node without neighbors (Home Theater Motion Senosor). Node without neighbors can cause issues within your zwave network.
Last but not least try to set a fixed priority route to your nodes. Especially form them which are connected only with 9.6 kbits (red lines).

Regards Olli

And try adding a short delay, like half second.

There was a similar issue (turning off lots of lights at the same time) in this forum about a year ago - with the same symptoms you reported. The conclusion was zwave networks fall down when too many commands are being executed simultaneously. Executing a command is a series of messages between the controller and the node - that take >70ms to complete. Because there is no flow control in zwavejs or the HA integration if you have an automation turning off a group or issuing a series of turn_off commands - HA/zwavejs will issue the sequence of commands faster than the commands can complete on a zwave network (80-100ms). Then all of these messages start colliding and needing retries which creates even more messages that collide and retry and eventually after 3 attempts zwavejs declares the node dead.

Certainly what can exacerbate the situation is already having a busy zwave network with too much data flowing through it. Or some slow nodes.

In addition to updating firmware AND make sure you are on latest zwavejs as there were some serious stability issues in it starting in august.

  1. Turn on zwavejs logging, capture a log of the failure.
  2. Try sequencing the off commands with a short delay 100ms - to see if you can successfully turn the lights off reliably in sequence. Keep adjusting that 100:ms up until you find a delay that works reliably
  3. Determine if you have chatty nodes that are sending data too frequently. See this post Determining what zwave values are creating the most traffic

I recall seeing an issue like this in the past and the solution was to use the Z-Wave: Set a value on multiple devices via multicast service. Instead of sending one message to one device at a time. You can send 1 message to multiple devices at the same time.

1 Like

@cornellrwilliams
Thanks, that looks interesting. I wish HA was smart enough to automatically use it when flipping a group of z-wave switches all at once. I’ll try to use this in my automation.
Edit: tried it and I couldn’t figure out what to put in most of the fields :

Edit: I figured it out. Unfortunately, multi-cast does not any better than unicast. For my 3 light switches in my home theater, it takes anywhere from 3 to 15 seconds to turn them on. Sometimes it hangs completely for a minute also - possibly the nodes went dead in this case.

Here is one of the two automations I use (this one for lights off, another for on).

alias: Home theater all lights off multi-cast
description: ""
trigger:
  - platform: event
    event_type: keyboard_remote_command_received
    event_data:
      device_name: flirc.tv flirc Keyboard
      key_code: 8
condition: []
action:
  - service: zwave_js.multicast_set_value
    data:
      property: targetValue
      value: false
      command_class: "37"
    target:
      entity_id:
        - switch.ht_main_light
        - switch.s2_on_off_switch_14
        - switch.s2_on_off_switch_4
mode: single

Meanwhile, Zooz gave me updated firmware for the ZEN76 800LR . I spent a day flashing it to 15 switches (some switches took 2 hours to flash !). Things are improved a bit, but I still have dead nodes sometimes. I also added 6 ZAC38 repeaters.

1 Like

@PeteRage,

Thanks for responding. Would you happen to know where the Z-wave JS UI logs are supposed to be located in the file system on HAOS ? I can’t find any z-ui*.log files anywhere using the SSH terminal and the “find” command .

Inside the Z-Wave JS UI control panel click the folder icon that says store. Here you can access all of the files generated by Z-Wave JS UI like log files, nvm backups, and so on.

Thanks, found it ! I downloaded it and tried @PeteRage 's analyzer script against it, but it returned nothing. I’m guessing the script needs to be updated to accommodate the current log format.

I had added a 50ms delay at first, then 100ms, but it didn’t help.

1 Like

I got quite an improvement since the last video .
I attribute this to more repeaters ( added 6xZAC38) , improved firmware on the ZEN76 800LR switches, and static routes.

It’s not fast, but none of the nodes went dead, and I was able to turn on all the switches and turn them back off within one minute.

The Hot tub light switch is the slowest. It’s also the farthest from the controller by a long shot - all the way on the top floor in the rear of the property, whereas the controller is at ground level in the front. I’ll try to play with the static routing in Z-Wave JS UI some more to see if I can improve it further.

2 Likes