Z-Wave nodes frequently going dead in automations

There was a similar issue (turning off lots of lights at the same time) in this forum about a year ago - with the same symptoms you reported. The conclusion was zwave networks fall down when too many commands are being executed simultaneously. Executing a command is a series of messages between the controller and the node - that take >70ms to complete. Because there is no flow control in zwavejs or the HA integration if you have an automation turning off a group or issuing a series of turn_off commands - HA/zwavejs will issue the sequence of commands faster than the commands can complete on a zwave network (80-100ms). Then all of these messages start colliding and needing retries which creates even more messages that collide and retry and eventually after 3 attempts zwavejs declares the node dead.

Certainly what can exacerbate the situation is already having a busy zwave network with too much data flowing through it. Or some slow nodes.

In addition to updating firmware AND make sure you are on latest zwavejs as there were some serious stability issues in it starting in august.

  1. Turn on zwavejs logging, capture a log of the failure.
  2. Try sequencing the off commands with a short delay 100ms - to see if you can successfully turn the lights off reliably in sequence. Keep adjusting that 100:ms up until you find a delay that works reliably
  3. Determine if you have chatty nodes that are sending data too frequently. See this post Determining what zwave values are creating the most traffic