Automate ZwaveJS Ping Dead Nodes?

Has anyone looked carefully to identify the situation/time/event/circumstances at which nodes go dead?

In my case I’ve been able to pin-pint precisely the situation responsible for nodes going dead - it only happens when I issue a group of Z-Wave messages. Specifically turning off groups of devices, in my case all lights and switches (24 devices or so).

I have the group OFF as a shortcut that I issue nightly from a scene controller. In the past I was able to see that nodes had gone dead about the time that group OFF was issued. After setting up this automation I can see that the dead nodes entity is populated at precisely the same time.

Last night 24 dead nodes reported (what!?) and testing today, 1 each time I triggered the automation. It worked each time to bring them back online.

1 Like

I had three zwave devices in it, but they no longer exist at all.

I deleted all of them yesterday… Since then, no more dead nodes of devices that are still in my network…

Let’s see …

1 Like

I know I’ve had issues with turning on a group that has all my light switches, but that usually just manifests as a few devices not turning on. I had it happen today but the template didn’t pick up any dead nodes after.

I know the last platform I was on had a metering option specifically for this that would space out the Z-Wave commands in a scenario like this. The idea being to prevent flooding the network. Maybe something like that is needed here.

1 Like

This works great so far. Thank you. I did have to remove my combo stick node status entity so it wouldn’t continuously get called, which others mentioned above.

My question is how can I track this (besides logging or sending a notification). I want to be able to see a history in HA of this happening. Do I need to set a sensor value? I’m new to this automation stuff.

These are working great for me, thanks for the updates! Have been struggling to get a reliable automation on this. I also added an item to the action sequence that send a notification, in case anyone is interested (easy enough to make it work with the other notification services, I just use Slack):

    - repeat:
        for_each: "{{ state_attr('sensor.zwave_dead_devices','entity_id') }}"
        sequence:
          - service: notify.slack
            data_template:
              title: "ZWave Dead Node"
              message: "New ZWave dead node: {{ state_attr(repeat.item, 'friendly_name')|replace(' Ping', '') }}. Pinging now."
              target:
                - '#debug'

Note that the notification action has to come first, before the repeat action to ping the node(s).

2 Likes

This automation is working without much issue or fanfare here, but it doesn’t stop HA nor other automations from shitting the bed.

I have an automation that I’ve been using for 2 years to turn off all lights/switches and media players in the house. The media player parts always work, no matter what order they’re in. The lights and switches often just hang with “still running” - or they don’t get activated at all. Each of them is a simple “turn off” for a single group. One light group and one switch group.

Z-Wave used to be SOLID when I ran Indigo, but HA has only been getting worse over the past two years with my Z-Wave devices. I’m tempted to swap them all out for Zigbee which has been problem-free over here, even with battery-powered devices.

Just thought I would add my experience as it is similar, though not quite the same, as others have discussed here.

I have a rather simple zwave network with pretty basic automatons. It’s been running fine for years. About a year or so ago, I did the migration from open zwave to zwaveJS. Everything went smoothly and continued working fine.

Then I added a few new devices to my network. That is went my troubles began. After a reboot or update install, all of the devices that were added under zwaveJS show as dead. I have to do a ping on each device to bring it back. It will be totally fine until the next reboot, power outage or update install. This only happens with the devices that have been added since the migration to zwaveJS. All of my older devices added under open zwave don’t have a single issue.

I don’t seem to have any issues with the devices randomly showing as dead as others here have reported. For me, dead nodes only happen on reboots. I have an aeotec gen5 stick. I had an old lamp module laying around from back when I used a vera device for zwave. I added that module to see what would happen and to prove to myself this isn’t just an issue with newer zwave devices. That is behaving the same way and requires a ping after a reboot. And identical module that has been setup in home assistant from the start never goes dead.

I am convinced my issue has something to do with the way zwaveJS adds the device. I just have no idea what.

Not sure if this is of my help to anyone, but hoping it may help in resolving the issue.

I have the same issue as everyone on this thread. I am in the process of moving to the ZST39 800 series zooz usb stick.

What I have not seen (but may have missed) is where anyone has communicated these issues to the creators of zwave-js. Does anyone know their stance?
Not that it is a fix, however adding the ping solution to the zwave-js source code should not be difficult.

Start here 🚧 META-Issue: Problems with 700 series (healing, delays, neighbors, ...) 🚧 · Issue #3906 · zwave-js/node-zwave-js · GitHub

The TL;DR is that this is an silabs issue I think was the consensus

1 Like

Are you having issues with your current stick or the 800?
I moved from the 700 to the 800 thinking it would be better but it’s the same. At times, worse.

1 Like

I have not moved yet. I am waiting for a spare 700 so I can do true backup. I will post when I have finished.

@jscolp. That thread is a bit old. The current firmware is 7.19 and most of the issues were with 7.17 with a few people on 7.18. This thread has many examples of using the zooz stick on other hubs and it works.
Does anyone know for certain if the issue is in the USB stick or in zwavejs?

It would be great to determine where the problem really and contact the appropriate org to fix it.

Adding my anecdotal experience to this thread. I initially switched from a HUSBZB-1 to a Aeotech Z Stick 7. My issues with (sometimes) slow to respond and dead devices started shortly after the migration. I upgraded that stick to 7.19 with no improvement. I’ve then migrated over to a Zooz ZST39 LR also running 7.19 with no improvement over the Z Stick 7. I still continue to get dead nodes that need to be pined to be brought back to life.

I’m interested to hear that most people have experienced this issue when issuing commands to multiple devices. I’m thinking of try to switch all my multi device automations over to use multicast to see if this potentially alleviates the issue. Has anyone else tried this as a solution?

1 Like

I switched to exclusively using multicast groups for more than one device. It’s just a better experience besides the network efficiencies. I also did some minor testing and verified that sending multicast commands to an HA group works exactly as expected.

Here’s a sample service call for dimmers:

service: zwave_js.multicast_set_value
data:
  command_class: "38"
  endpoint: "0"
  value: 255
  property: targetValue
target:
  entity_id:
    - group.all_inside_zwave_light

And for switches:

service: zwave_js.multicast_set_value
data:
  command_class: "37"
  endpoint: "0"
  value: "on"
  property: targetValue
target:
  entity_id:
    - group.all_inside_zwave_switch
3 Likes

I call BS on that. No other platform seems to see this issue. And in HA it’s happening with all versions of firmware on 700 sticks, 800 sticks, and some people report the same kind of node drops on 500 sticks.

Meanwhile you go look at the forums of other platforms and crickets as far as this type of issue goes.

So if there’s something in the firmware of these devices that causes them to not respond (and appear dead) its something that’s only being manifested with the way Z-Wave-JS is operating. And given that the development stack from SiLabs is over a decade old and proven on other platforms, you can rest assured a change to fix HA isn’t going to suddenly appear from that side.

2 Likes

I think I mentioned it earlier in this thread but I agree. For some major problems I was having where everything was dead, a restart of zwavejs fixed it temporarily. Which seems to indicate software. Sadly, I don’t have anything that would help the devs.

Ok. I guess the 10 month long thread on the zwave-js GitHub repo was wrong. You’re probably correct, thanks for that.

He wasn’t personally attacking you - just that his experience/research points to software vs hardware.

If anyone is interested…
I have an automation triggered after 2 seconds to execute notify.pushbullet and notify.persistent_notification to inform me and the automatic ping automation runs after the sensor has changed state for minute. Notice my grammatically correct message!

alias: Z-Wave - Notify that node(s) are dead
description: ""
trigger:
  - platform: state
    entity_id:
      - sensor.dead_z_wave_devices
    to: null
    for:
      hours: 0
      minutes: 0
      seconds: 2
condition:
  - condition: numeric_state
    entity_id: sensor.dead_z_wave_devices
    above: 0
action:
  - service: notify.pushbullet
    data:
      title: Dead Z-Wave Node
      message: >-
        Z-Wave node{%- if states("sensor.dead_z_wave_devices")|int(0) > 1 %}s{%-
        endif %} {{ state_attr('sensor.dead_z_wave_devices','entity_id') }} {%-
        if states("sensor.dead_z_wave_devices")|int(0) == 1 %} is{% else %}
        are{%- endif %} dead or unavailable.
  - service: notify.persistent_notification
    data:
      title: Z-Wave Node Status
      message: >-
        Z-Wave node{%- if states("sensor.dead_z_wave_devices")|int(0) > 1 %}s{%-
        endif %} {{ state_attr('sensor.dead_z_wave_devices','entity_id') }} {%-
        if states("sensor.dead_z_wave_devices")|int(0) == 1 %} is{% else %}
        are{%- endif %} dead or unavailable.
mode: single
1 Like

Thank you Officer, nothing to see here, move along, right?

Was there an issue with SiLabs firmware? Yes. And it’s been fixed. So now what? Finding, documenting and proving the cause of software issues used to be my career - which lasted much longer than 10 months.

Riddle me this… How does a dead node respond to a ping? It doesn’t. These nodes aren’t dead.