Automate ZwaveJS Ping Dead Nodes?

Glad that it can help you! It took me some time to make so this increases my return on investment :grinning_face_with_smiling_eyes:

Same here. I have several but the sensor shows 0.

If I run the template in dev tools it does show me all the ping buttons.
Leaving that open, going to zwavejs in another tab, pressing Ping there, I can see the result update. So it would appear the count is not updating?

Update: no, not that since entity_id is empty:

entity_id: []
unit_of_measurement: entities
friendly_name: Dead ZWave Devices

Strange that the template updates itself in dev tools but the sensor doesn’t.

As a test I added a trigger to the sensor for every 5 minutes. This successfully updated the entity_id list but the State value never changed from 0. Nor did the entity_id list change if I manually ping the nodes.
After another 5 minutes though, it did update the entity_id list and then it updated the State value to 4. At that time though I’d already pinged them all so the entity_id was empty.

Odd.

soooo, does anyone know the cause of the devices going dead? i implemented this because every once in a while, specially after some sort of reboot, a large number of my devices go dead. Not all, but a majority. sometimes the ping works, sometimes it doesn’t and i have to walk around the house pressing buttons on the devices for them to come back on. it’s really strange and there is no discernible pattern. Most reboots are just fine, just once in a while everything comes crashing down. I do have a 700 series stick, and have it patched to 7.17.2. Just trying to see if anyone has an idea on what’s going on…

1 Like

I have same problem. Patched to 7.17.2 and currently trying to implement ping script to mitigate as it keeps on occurring - well mainly for more distant z-wave devices (Aeotec ZW).

The problem is Z-wave 700 is hot garbage compared to Z-wave 500. Many will defend it but there’s really no excuse for the hoops we have to jump through just to get Z-wave 700 “working”.

The difficulty with 700 series stability lies in challenge of:

  1. Knowing the difference between the Z-wave JS integration, Z-wave JS addon, and Z-wave JS UI addon.
  2. Capturing logs at the time of failure
  3. Enabling and collecting the necessary logs.
  4. Having someone who actually knows what to look for within the logs determine the cause of the failure.

Step 0 is self explanatory. Spend any time in the #zwave Discord channel and you’ll know exactly why Step 0 exists.

Step 1 is a point of failure because by default, logging is not enabled at the necessary level regardless of the addon being used. In most cases, the initial report of dead nodes dies after someone asks, “What do the logs say?”

Step 2 is another point of failure. Unless you are running Z-waveJS UI from the get go, which many aren’t, by the time you experience step 1, you’ve subsequently been told to “Install JS UI” because logging is better/easier/faster.

Step 3, IMHO, is the single, most challenging point of failure because there are so few people who actually fit that description. Additionally, the people who do fit that description are making an educated guess based on your description. Much of their troubleshooting is reliant upon the strength of your communication abilities.

And if you fail to communicate your problem effectively, chances are you’ll end up being given one or more of the following summations:

  • RF Interference
  • Poor Network Construction
  • Lack of a USB extension cable/USB Hub
  • USB Passthrough is unstable (on everything except VMWare)

If you’re still having trouble, or if you’ve been met with any of the above, do yourself a favor and downgrade. If any of the above summations follow you to the 500 series stick, at least you’ll have eliminated Z-wave 700 as the cause. I’m willing to bet that won’t be the case.

But I digress…

To show that I’m not just a salty Z-wave user, I will contribute something useful to this thread. Here is an automation I created to notify me when a Z-Wave node goes dead. It includes a template that “should” provide the entity_id of the dead node, which should hasten the speed in which you can collect logs for sharing. I haven’t been able to test this automation as I haven’t had a dead node since I downgraded to Z-wave 500.

Please note - This automation assumes that you’ve set up one of the many notification services that exist for Home Assistant. I use Pushover.

alias: Zwave Node is dead!
description: ""
trigger:
  - platform: numeric_state
    entity_id: sensor.dead_zwave_devices
    above: "0"
    for:
      hours: 0
      minutes: 0
      seconds: 1
condition: []
action:
  - service: notify.pushover
    data:
      message: >-
        Zwave Node {{ state_attr('sensor.dead_zwave_devices','entity_id') }}
        just went dead or unavailable.
      title: Dead Zwave Node
mode: single

EDIT 1/23: For the first time since I downgraded back to Z-wave 500, my dead node notify automation kicked off. Here’s an example of what it looks like when a node goes dead.

As you can see, the notify automation now includes the node name. Hopefully this proves helpful to others.

3 Likes

Excellent job with this, worked perfectly. Well done to all involved :+1:t4:

Did you use a palette? i get “Unknown: calculator”

Yes I used this one: node-red-contrib-calc (node) - Node-RED
All that node does is to increase msg.limit by 1 so you can use that using existing nodes as well. I was just being lazy by using the simple calculator node :):slight_smile:

1 Like

Thanks, forgive me as I dont use node red often but i imported your flow and i deployed it and got a bunch of server config not found and supervisor token missing. However I already have my server setup for another flow that works fine.

I edited your nodes to try and change the server and noticed I now have 3 Home Assistant servers to choose from the drop down. I’m not sure how I ended up with 3 but I changed it to the first one that seems like the correct one. however, when I did that, the “wait until available” node changed what you had {original.data.entity_id}} to one of my actual entities (the first one spelling wise), which seems incorrect.

EDIT: it looks like the other 2 servers were for “home assistant add-on”, since I use docker and don’t use add-onss, i just deleted those 2.

So the main issue is I am not sure how to fix the “wait until available” node. it wont give me the option to put in your original code, it just makes me pick one of my entities from a drop down.

Yes I see what you mean.
In the below image the box to ender the Entity is a bit too smart as it will look for matching entities and wont accept anything else.
The reason I was able to enter {{original.data.entity_id}} is that I created this specific node already a few years ago when it wasn’t so smart and also accepted non-existing entities.
I would also like to understand what the technique is nowadays to get this too work :thinking:

the first time i tried it wouldnt let me type anything, for some reason it eventually let me and I was able to type your code snippet.

However, I’m still having issues. I’m not getting any kind of error messages but all but the first node show error. The debug is empty, so I have no idea how to go about figuring out the problem.

Ah that is strange.
You do indeed need to make sure it is connected to your home assistant server but I understand you fixed that.
Very strange that it does say ‘error’ but that the debug panel is empty.
Not sure what the problem could be

What does the trace for the automation script say. That’s what pings it and if it’s not working something is wrong.

Where does sensor.dead_zwave_devices come from?

Just ended up here with what I think is a different issue but who knows. Once in a while (days? weeks?) I’ll have a switch that will seemingly turn itself back to its last state after I command it on or off. So like I’ll have an automation turn on the garage light and it turns off 2 seconds later. Or turn off the front porch light and it turns on 2 seconds later.

I’ve not figured out why, but maybe they are going intermittently unavailable and coming back? Worth looking into.

I do have maybe once a month my one farther away zwave lightswitch for a basement light goes unavailable until I ping it. Otherwise everything else has been SO much nicer with 700 series zwave stick way faster response times using S2 encryption vs S0 encryption.

Scroll up in this thread. There are several code examples showing how to create the dead zwave sensor.

How did you change script to fire each x secs if there is dead device? Could anyone help me to edit this YAML to trigger ping to dead device every x seconds after dead devide is spotted?

template:
  - sensor:
      - name: "Dead ZWave Devices"
        unique_id: dead_zwave_devices
        unit_of_measurement: entities
        state: >
            {{
              states
              | selectattr("entity_id", "search", "node_status")
              | selectattr('state', 'in', 'dead, unavailable, unknown')
              | map(attribute="object_id")
              | map('regex_replace', find='(.*)_node_status', replace='button.\\1_ping', ignorecase=False)
              | list
            }}
        attributes:
          entity_id: >
            {{
              expand(integration_entities('Z-Wave JS') )
              | selectattr("entity_id", "search", "node_status")
              | selectattr('state', 'in', 'dead, unavailable, unknown')
              | map(attribute="object_id")
              | map('regex_replace', find='(.*)_node_status', replace='button.\\1_ping', ignorecase=False)
              | list
            }}

automation:
  - id: ping_dead_zwave_devices
    alias: Ping Dead ZWave Devices
    description: ''
    trigger:
      - platform: state
        entity_id:
          - sensor.dead_zwave_devices
    condition:
      - condition: template
        value_template: >
            {{ states('sensor.dead_zwave_devices') != [] }}
    action:
      - service: button.press
        target:
          entity_id: >
            {{ states('sensor.dead_zwave_devices') }}
    mode: restart

I tried adding

- platform: time_pattern
        minutes: "/1"

to triggers, but it didn’t work as expected (it ping every minute, but even if Z-Wave devices weren’t dead)

template:
  - sensor:
      - name: "Unresponsive ZWave Entities"
        unique_id: unresponsive_zwave_entities
        unit_of_measurement: entities
        state: >
          {% if state_attr('sensor.unresponsive_zwave_entities','entity_id') != none %}
            {{ state_attr('sensor.unresponsive_zwave_entities','entity_id') | count }}
          {% else %}
            {{ 0 }}
          {% endif %}
        attributes:
          entity_id: >
            {% set exclude_filter = ['sensor.700_series_based_controller_node_status'] %}
            {{
              expand(integration_entities('Z-Wave JS') )
              | rejectattr("entity_id", "in", exclude_filter)
              | selectattr("entity_id", "search", "node_status")
              | selectattr('state', 'in', 'dead, unavailable, unknown')
              | map(attribute="object_id")
              | list
            }}

automation:
  - id: ping_unresponsive_zwave_entities
    alias: Ping Unresponsive ZWave Entities
    trigger:
      - platform: time_pattern
        seconds: /15
#      - platform: state
#        entity_id:
#          - sensor.unresponsive_zwave_entities
    condition:
      - condition: template
        value_template: >
          {{ int(states.sensor.unresponsive_zwave_entities.state) > 0 }}
    action:
      - service: button.press
        target:
          entity_id: >
            {{ 
              state_attr('sensor.unresponsive_zwave_entities','entity_id') 
              | map('regex_replace', find='(.*)_node_status', replace='button.\\1_ping', ignorecase=False)
              | list
            }}
      - service: system_log.write
        data:
          message: >
            {{ "Unresponsive Z-Wave Entities: {}".format (", ".join (state_attr('sensor.unresponsive_zwave_entities','entity_id'))) }}
          level: error
          logger: zwave
    mode: queued
1 Like

Gee, I thought it would be a great idea to update my Z-wave/Zigbee combo stick to more modern individual sticks for each protocol (and maybe it was a good idea in the long run), but for now I greatly appreciate those above who have shared their experiences with dead nodes and how to use Home Assistant to fix them in the background.

As mentioned above, I too ran into the 255 character template limit a couple times after Z-wave JS UI restarts, so I wanted to ditch the template and put it all into the automation and share it here.

I used the for_each repeat method since I’ve never tried it. Seems to work. I did see that the ‘ping’ service is going away in the future, but it’s here now so I’m using it. Maybe the 700 issue will be fixed before it’s removed. Feel free to make suggestions if you spot any issues. Thanks for this great community.

alias: Ping dead Z-wave nodes
description: Try to revive Z-wave nodes that are shown as dead by the controller
trigger:
  - alias: When there are Z-wave nodes shown as dead for 10 seconds
    platform: template
    value_template: >-
      {{ expand(integration_entities('Z-Wave JS')) | selectattr("entity_id",
      "search", "node_status") | selectattr('state', 'in', 'dead, unavailable,
      unknown') | map(attribute='entity_id') | list | length() > 0 }}
    for:
      hours: 0
      minutes: 0
      seconds: 10
  - alias: When it's the top of the hour trigger this automation
    platform: time_pattern
    hours: /1
condition:
  - alias: Check that there are Z-wave nodes listed as dead
    condition: template
    value_template: >-
      {{ expand(integration_entities('Z-Wave JS')) | selectattr("entity_id",
      "search", "node_status") | selectattr('state', 'in', 'dead, unavailable,
      unknown') | map(attribute='entity_id') | list | length() > 0 }}
action:
  - alias: >-
      Repeat the actions of notifying Telegram and trying to ping each dead node
      in order to revive it
    repeat:
      for_each: >-
        {{ expand(integration_entities('Z-Wave JS')) | selectattr("entity_id",
        "search", "node_status") | selectattr('state', 'in', 'dead, unavailable,
        unknown') | map(attribute='entity_id') | list }}
      sequence:
        - alias: Send Telegram message via script
          service: script.turn_on
          entity_id: script.notify_handler
          data:
            variables:
              text: >-
                --💀 Z-wave node dead. Trying to revive: {{ repeat.item |
                replace('_','-') }}
              detail: notice
        - alias: Ping the dead Z-wave node
          service: zwave_js.ping
          target:
            entity_id: "{{ repeat.item }}"
        - alias: >-
            Wait 2 seconds in between each pinged node to prevent flooding of
            network
          delay: "00:00:02"
  - alias: Wait 5 minutes in case this automation is called repeatedly
    delay: "00:05:00"
mode: single

Can you point to your source on this? If that happens, my house will probably cease to function.