Automate ZwaveJS Ping Dead Nodes?

jscolp · August 14, 2023, 3:00pm

Then find a way to positively contribute. So far you have been complaining about open source projects and “calling bs”. You’ve been pointed to historical conversations regarding your complaining in GitHub to get you going.

This thread is about an automated way to compensate for the issue, and several people have thankfully shared their scripts and automations.

For people subscribed to this thread for updates, you’re not helping and neither am I for entertaining your banter. I’m out of this conversation now.

dbrunt · August 15, 2023, 8:43pm

My career was (and still is as an independent contractor) in the field of Managed IT services so for nearly 40 years now I have been diagnosing and finding fixes/solutions for anything computer related: software, hardware, networking, wireless, firewalls, etc.
The nodes are rather than and that node status is called dead meaning Z-Wave JS is failing to communicate with it, whether that be a ping or otherwise. If you ping a dead node and it responds then zwjs marks it as okay but even then it may not be working right. If you get no response from a ping then it is totally dead. I have several iBlinds motors that frequently fail. Sometimes zwjs shows dead, I ping them and they resume working but othertimes they respond to ping, zwjs marks them alive but they still do not respond to commands and I have to power them off and then recalibrate them to return them to service. That’s not zwjs’s problem, it’s the manufacturer’s i.e bad firmware!

If the nodes shouldn’t be marked as dead then what should they be marked as?
If you can find out the real cause of the problem, and a solution, then offer up a solution!
Right now pinging a dead node is all we’ve got and more often than not it solves the unhappy state which developed between zwjs and the node causing zwjs to give up on it.

Daniel76 · August 27, 2023, 9:01am

Since 4 days I‘ve got no dead nodes. The only thing I changed was update the zwavejs UI . Before I had dead nodes multiple times a day. Has anyone made similar observations?

freshcoast · August 27, 2023, 4:28pm

There was a recent change in the driver to avoid marking nodes as dead in the case of a jammed controller. It’s possible that was your problem. Or it was just coincidental, maybe you moved something in your house that was blocking a good route to the controller, and now it works. Frankly that’s impossible to say as dead nodes can be caused for a variety of reasons, and the only way to diagnose it is with driver debug logs.

JDIacobbo · August 27, 2023, 4:33pm

Anecdotally I am also seeing increased stability since updating as well. Dead nodes were a daily occurrence but I haven’t had one since.

NathanCu · August 27, 2023, 4:45pm

I’m seeing near elimination of dead nodes now where I had at least one a day pop for as long as I have been on HA/ZwjsUI (500 series aeotec) and resolve through a variant of this ping automation.

I have recently done some. Routing cleanup using JsUIs new map (sooo much better) and was attributing it to that but now that you mention it there was an update where the ones I was seeing just vanished.

@freshcoast, should the controller naturally resolve itself in the background and everything continue along merrily?

Chef-de-IT · September 29, 2023, 9:00pm

Fantastic thread! Unfortunately I’ve no reason to expect Z-Wave node dropping out issues to go away - not with current chipsets, not in the future. Just my opinion based on interactions with SiliconLabs, Zooz, Lutron, and HA Z-Wave devs. Specifics below.

My context: On the larger of the only two HAOS Z-Wave integration deployments of mine (a NYC restaurant, landmark bldg thick walls 3 floors - a few dozen all-S2 700 series - stick, one fan controller, the rest are light dimmers, mostly Zooz incl stick, firmware kept up to date, stick on USB ext cable, over-spec’d host CPU 1%, etc) - a LOT of grief. Nodes going dead, Z-Wave renaming nodes, etc. On the smaller deployment (same client lol - he must have sinned in past life & his punishment is me) identical but fewer Z-wave devices in a high-rise apt, hardly any problems. NYC electrical codes stipulate metal electrical boxes; walls tend to attenuate RF; there’s a LOT of RF interference, so Z-wave may work great in a hobbyist deployment in a 2-bedroom apartment or smaller; anything bigger or more serious is asking for trouble.
My Zooz and SiliconLabs interaction was on the topic of programmatically changing the color of lights in Zooz ZEN32 Scene Controller – ZOOZ to communicate the status of various things related to those LED’s respective buttons. The change itself is trivial (for instance in HA UI script you can Set value of a given zen32 controller’s config parameter 7 to Red to make the Button 1 LED glow red). My question to the support (eventually escalated to devs & engineers), since it wasn’t device state like on/off but part of the non-volatile configuration, what was the endurance of the EEPROM where this was being stored. Because I wanted to know, if I change that LED color value say 50 times a day, this’ll brick the device in 1.5yrs or closer to 15yrs? (Or they’re clever and upon reaching endurance limit the value is cast in stone but the device continues to function). Without revealing too much, “smart guys in some other lab” are putting “silicon” in SiliconLabs - who are more about protocol, ecosystem husbandry, marketing, and occasionally even talking to schmucks like me. That’s the bad news. The good news is, the answer is “closer to 15yrs” which after asking around separately I was telling them not hearing from them. And by the way, if your device’s power-on state is configured to be “same as before power outage”, that current state goes into EEPROM also, whenever it’s changed.
My interaction with HA devs is limited to a few issues e.g. https://github.com/home-assistant/core/issues/80398 and one lowly pull request, and my impression is, those guys are puled in a million different directions, any meaningful change to the code sometimes results in opposing or mutually exclusive suggestions or requests from the stakeholders for the change to go in, so other than fixing a trivial bug, coding the solution to some architectural issue is actually peanuts compared to you then having to build a friggin consensus around that being the right way forward. And you’re doing this free - like stepping on a turd on a NYC sidewalk is free. From the above ticket you can glean that there’s a strong priority on legacy device support (cough, architectural baggage cough), so certain fundamental improvements in the newer generations of hardware may not be carried over to HA.
Meshes are potential trouble, inherently. Self-healing wireless meshes in a congested spectrum, doubly so. You can tell if a given mesh ecosystem is on top of this trouble if:

There’s a mature, useful “mesh dash” and mesh interrogation, testing and health monitoring tools making the otherwise opaque thing transparent and delivering actionable answers.
There are ways to optionally partially or fully “nail down” a mesh, such as specifying 2-3 parents for each child and letting those parents know which children they might be responsible for.
I don’t see this tooling in Z-Wave or HA to this day at any degree of useful maturity other than a feeble Z-Wave JS UI connection graph.

2 + 3 + 4, to me, amount to not expecting radical improvements from Z-Wave anytime soon. Small installations in rural homes with RF-transparent walls may work without a hitch simply because they side-step the more complicated mesh comms edge cases surfacing in the more challenging environments that trip up the Z-Wave protocol or its implementation or integration. And a hobbyist in a rural sheet-rock home may be all the user base which this ecosystem targets, because taking on anything more would require an entirely different level of cadres, processes, and ultimately investment of money and time. They’ll simply limp along fixing these issues eventually, but not at a rate that outpaces new issues being introduced or the RF environment becoming ever more congested with each passing day.

So

On new deployments I’ve been using Shelly Plus Wall Dimmer and while the device feel is cheep and cheerful, functionally it’s been like a breath of fresh air. The HA integration is rock-solid, there’s NO reliance on a vendor cloud and NO need for an app - you can configure purely via a web browser. (Note: their tiny Shelly Dimmer 2 is good also, but with LED bulbs only due to its limited power handling beyond 50-60 watts it may intermittently thermally shut down, plus I’d stripped its dainty connection screws on one of the units). Shelly is an excellent vendor from EU as well, extremely responsive, know their stuff, and their heart is in the right place re privacy, open source, and many other things,

The other mesh I tested and got fantastic results was Insteon. It was struggling financially a few years back but came back strong. The devices have a premium feel, communicate in two redundant ways (wired PLC through their AC power wires plus wireless). This amounts to it working fine for me with HA straight out of the box. You can actually even program the individual devices to control each other directly to make the scenes completely serverless via a bunch of key presses, but I prefer the flexibility of HA via the USB insteon modem with no scenes “baked into” devices.

Back to this thread’s topic, it’d be great to put together maybe an HA blueprint of “occasionally either X min after death or at a particular time(s) of day, activate press of the Ping button of any Z-ave devices that are showing dead” (as much implemented in vanilla HA UI as possible, no reliance on NodeRed or anything non-HAOS / minimal code if unavoidable). Because, I’ve a feeling, people will have good use for it for quite some time.

Alex

peterdorn · October 9, 2023, 1:19pm

Hey all. Just read through this topic, since I experience this problem only since a few weeks!
I have a Aeotec Z-Stick Gen 5+ with FW 1.0
Never did I have nodes dropping out, since I updated Home Assistant (not sure which version).

Could it be, that what fixed this problem for many, cause the problem for me? Anyone experiencing the same?

ashman5 · October 15, 2023, 1:59pm

After several weeks of no dead devices, the latest updates seem to have them reporting as dead again. Anyone else seeing simiilar?

ben1492 · November 7, 2023, 1:40am

I thought I lost my zwave gen 5 stick after many years of service a month ago. Turns out it was the Soft Reset setting in Zwave JS UI that was causing the problem and my stick was fine (hint: read the breaking changes before updating). I didn’t realize that until after I “upgraded” to Aeotec Z-stick gen7. After trying the controller shift method with Simplicity Studio’s Zwave PC Controller 5, I manually added back 46 devices. I thought the controller shift was the causing the problems I was seeing. Still, even after hard resetting the stick and manually excluding and then re-including the devices, many of the older ones just would not work with the gen7 stick, Leviton VRCS4 scene controllers being one of the big problems. I upgraded to Zooz Zen32 scene controllers and those worked better. I still had a lot of devices that were more than a hop away that just would not stay connected. Also, even with Zen32 switches, when I called scenes using those, the lamps being controlled would just go dead. I spent the last weekend and this weekend trying all sorts of “ping dead node” logic I found in this thread to fix, but finally threw in the towel. After I couldn’t use the npx convert logic to transfer the gen7 network to a gen5, I bought a z-stick gen5+. I just transferred this evening and it’s just night and day. The gen7 insisted on preferring one hop instead of using mesh, so the network map looked like wheel and spoke. The gen5+ has a much more logical map, with many of the further out devices out 3 (and sometimes even 4) hops. But they work seamlessly. The Zen32 scene controllers are also working great.

I’m only a few hours in, but I think whatever your issue was, I’m guessing it’s likely resolved by now? My problems were definitely related to the 700 series. Whether it’s the 700 series itself or just how it interacts with Zwave JS UI, I don’t know.

Vorhees · November 7, 2023, 7:51am

Do you people think that this issue will ever be fixed or should 700 series stick be thrown in the trash and be done with it. It’s simply taking too much time and energy to do all the tricks to get it semi working and it usually doesn’t work just when you need things the most.

FriedCheese · November 7, 2023, 3:11pm

I have the Zooz ZST10 stick and have had almost no problem aside from the know issue that kicked up around the soft reset a few months ago.

ben1492 · November 7, 2023, 11:25pm

I’m sure it will get figured out over time and the gen7 will be as rock solid as the gen5 series. But it’s well worth the $50 for the gen5+ stick in the meantime to not have to tinker with it anymore. Transferring was super easy using the npx nvmedit convert tool. My plan is to use the gen7 as a backup in case the gen5+ fails.

Vorhees · November 8, 2023, 8:38am

It’s not 50$ that’s bothering me as it’s re-interviewing all devices

5er · November 13, 2023, 1:10am

Just wanted to add that I am not using a 700 stick, and have been having the same issues. I am using a HUSBZB-1 stick and have been for years. I recently tried to add a Zooz Q sensor and 2 additional Zooz Zen77 dimmers, but none of them would stay alive. My old Zen77 and GE switches never go dead, they were installed more than a year ago and never had any issues. The new dimmers were about 40 feet from my hub and my old switches. I could ping them back alive, but it kept screwing up my automations. When the Q sensor was working, it was one of my least reliable and slowest motion sensors.

I finally just gave up and replaced the 2 new dimmers with the inovelli blue switches. Zigbee has been rock solid for me.

jscolp · November 13, 2023, 2:55am

This has a few issues that might not be related. Note that zwave is relatively low bandwidth so that zooz q sensor might have been too chatty. Also 40ft is pretty far for the network and you might have just had a distance problem, even with the mesh.

In this case the 700/800 series might be a decent upgrade for you, and make sure to be pairing with the PSK or SmartStart. If you have any gen1 zwave devices you may also want to consider their placement or retiring/upgrading those devices.

ashman5 · November 23, 2023, 3:25pm

Here’s a thread for a Blueprint to ping unresponsive nodes on zwavejs2mqtt

Anyone have the ability to covert this to ZwaveJS?

petro · November 24, 2023, 1:19am

It also works for zwave js

wimjanse · March 20, 2024, 2:54pm

Could you describe briefly how to transfer all info from a 700 to a 500 stick. I just moved 66 zwave devices from my old Fibaro HC2 controller to an Aeotec 700, and z-wave just is a disaster, dead nodes and very slow response.

I understood from your message that I can transfer the 700 info to a (hopefully more stable) 500 stick. Do you have to transfer nodes.json and nvm (and convert nvm from 700 to 500)?

Transferring from my HC2 to the 700 took me a full day, want to prevent that if I start using a 500 stick.

Sireone · March 29, 2024, 2:40am

Is there a way to expand the sensor to actually show the dead devices? Maybe using something like auto-entities?