Why is Z-wave inclusion so @#$* hard?

@mike15 I didn’t and it works fine with just the Nabu Casa URL.

Once you have the ability to scan a QR working, this post has instructions for connecting 800 LR devices using Smart Start.

Update: Z-wave inclusion is still maddeningly unreliable.

I ordered a USB extension cord to separate the dongle from the HA computer. In the meantime, I reoriented the computer to point the dongle towards the device, and I had a blessed ~1hr that things worked beautifully! I was even able to include via “SmartStart”. But then, despite nothing changing, things went back to the way I’ve described above.

A few days later I received the extension cord and relocated the dongle. That honestly hasn’t made any difference. If I leave a device (sitting 6’, clear line-of-sight) in SmartStart overnight, then by morning, if I’m lucky, I have 4 or 5 new artificial dead nodes appearing in my Z-Wave JS UI list. I had one successful inclusion from one device.

I’ve gone back to manual inclusion (vs. Smart Start) and about 1 out of 5 times it will include… but without security (which is maddening because I selected “Force Security”), so I have to exclude and then start all over.

This is seriously unreliable technology – or pretty blameworthy user error if I’m doing something wrong :slight_smile: . Would again appreciate any advice!

Yeah this is super frustrating. I’m this close to deciding that “Z-wave” is a sham “technology.” I’ve been trying for literally hours to include another lightswitch that an electrician will be installing tomorrow (too far from the controller to communicate without a hop, therefore needs to be included beforehand). I’m literally holding a top-of-the-line 800LR switch 18 inches from a top-of-the-line 800LR dongle made by the same company, and forcing security and the trashy thing keeps including fake node after fake node (without S2 security despite “forcing”), running up my node count and accomplishing nothing.

Please please please tell me I’m making some boneheaded mistake here… because otherwise, z-wave is complete garbage.

Sorry for how frustrating that must be! That is definitely not the normal experience. Something about your environment must be non-typical.

Could you possibly be running both Zwave JS and Zwave JS UI at the same time? From Settings → Add Ons, make sure ZWave JS is either not present, or is stopped.

1 Like

Thanks. in Settings → Add Ons I only see are Z-wave JS UI (plus a few other unrelated add-ons).

Ok. One thing ruled out then.

Next possibility to explore is whether you have a bad device that is flooding the Zwave network. To check this, look at the log (Settings → System → Logs → Z-Wave JS UI [blue, upper right]). Are there any error messages, or devices (Nodes) that seem to be filling the log / constantly reporting? If so, try physically disabling that device (unplugging it, removing its battery, or temporarily throwing the circuit breaker it’s on) and see if the log and your communications works better…

(Thanks again for the help here!)

I went to Home Assistant settings (note: not Z-wave JS UI settings), then system, then logs and switched to Z-wave JS UI. The resulting log file only seems to show logs from the past 3-4 minutes. Is there a way to get longer history? (Otherwise, I’m just refreshing every couple of minutes to get a quick scan of what seems to be happening…)

That said, I do periodically get the message “dropping message with invalid payload”… though it’s not all the time… but sometimes 10-20x /minute.

And about once every 30 sec there is a message of the form “APP: GET /health/zwave 301 1.680 ms - 191” (with slightly differing numbers of ms… sometimes as high as ~2.3)

Other than that most of the activity looks as-expected. I have a handful of Z-wave sensors that update values every 30-60 seconds (e.g., power plugs reporting wattage, or sensors reporting temperature/motion). What’s a reasonable number of “value updated” entries to see per minute in a healthy, unsaturated network?

Thanks!

The “dropping message with invalid payload” might be an clue. If that is always from the same Node, you could consider removing the device with that Node and seeing if things improve.

My “APP: Get /health/zwave” messages are typically 0.3 or 0.4 ms, so your 1.6 or higher sure seems to indicate that things are not communicating well. But whether 1.6 represents an actual problem is beyond my technical skill level. Hopefully someone who knows these details better can chime in.

Something else to look at is the Network Graph on the Z-Wave JS UI tab. That might help identify any particular devices that are communicating poorly. Could you post a screen shot of the graph?

The “Dropping message with invalid payload” does not reference a Node number. It just says the time, then “DRIVER Dropping message with invalid payload”.

Separately I did notice a sequence “the controller is jammed” / “Controller status: Controller is unable to transmit” / “The controller is no longer jammed” / “Controller status: Controller is Ready”. (over the span of about 2 seconds). This seems to happen ~ every 5ish minutes.

And another update: about half of my nodes are now dead (plus a bunch of the “fake” nodes that appear when you try to include a device – but I can’t remove them without losing the real-but-not-currently-speaking nodes too), and I periodically see stretches of alternating “Controller status: controller is ready” and “Controller status: controller is unable to transmit” in my logs. And none of the devices seem to actually function (e.g., have disappeared from dashboards and uncontrollable in the “Devices” section of HA.

Help!!! (And thanks again to everyone helping me with this!)

You might consider changing the title of your thread as you clearly have a network stability problem and not an inclusion problem as the root cause.

You won’t like this advice but send the family out for the day and shut down the network. Pull all the batteries of battery devices and kill power to the mains powered devices and slowly turn things back on from the controller out by distance, watching the debug log in a dedicated window. You shouldn’t get repeated jammed or unable to transmit messages. You may have one or more devices that are causing problems. You could have some external interference in the 900 Mhz range. Without a spectrum analyzer, it will be hard to know for sure.

You could start with the battery powered devices as they aren’t routing devices. Shut them all down and observe the results. The Z-wave-JS UI has a lot of troubleshooting tools built in. In the node map, when you select a node, you’ll get a Diagnose button. Check things from the controller out for the mains powered devices. Diagnosing a battery device will mean keeping it alive, which can be painful depending on device placement.

I wouldn’t go changing things in the Z-wave internals unless I proved I could ace a Z-wave architecture class.

There are 2 things that I think can cause your problem.

  1. What firmware are you running on your controller?
  2. You might have a bad device that is causing interference. I had a switch that was bad and it would flood my network. The reason I was able to diagnose this issue was because I have multiple controllers and everytime the switch would fail all of my networks would go down at once. I couldn’t add or remove devices or control them.
  3. To diagnose whether it was a software issue or hardware issue I switched to the Z-Wave PC Controller software. Its a free software provided by the people who manage the Z-Wave protocol. Using that software I had the same issue so I was able to conclude that it was a hardware issue.
1 Like

Is the extension cable you purchased for the Z-wave dongle powered? I had many of these same problems until I moved the dongle to a USB hub that had its own 5V power source. The conclusion I came to was that the mainboard wasn’t providing enough power to the dongle and thus it didn’t work properly, a simple extension cable for the dongle won’t fix this.

HTH

1 Like

Thank you everyone! I really really appreciate all of the advice!

After several days of waiting for an opportunity to do some more significant testing (along the lines of what @mterry63 suggested), the network miraculously seems to have largely healed: all but a few devices have reconnected and a scan of the logs shows only a about a dozen instances of the “unable to transmit” error (each lasting ~1 second long) in the last 24hrs. I do not see any evidence in the log of any nodes spamming up the network. (@mterry63 I will start a new thread on the “unable to transmit” topic and stick to challenges with inclusion in this thread, thanks for the suggestion.)

However, I’m still having persistent problems including new devices into the network. They seem to (mostly) either time out when trying to include, or time out when responding to the initial S2 authentication. I get lots of dummy nodes appearing for every successful inclusion. (It’s as if it tries to add the node, fails to establish S2 security, creates a node in the Z-wave JS UI list anyway, but it remains dead and often as “unknown manufacturer” / “unknown product”… and then repeats the process.)

@brianmacdonald I will try getting an external power source for the USB dongle (powered hub)
@cornellrwilliams the controller is running FW v1.20

Any other advice?

Additionally, how can I re-include the couple of devices that are still marked as dead (despite being mains-powered devices located physically within a couple feet of devices that are very much alive)? They are hard to move physically (installed light switches and sensors) and given all of the challenges including, I’m hesitant to exclude-then-reinclude…

Thanks again so much for all of the advice!

1 Like

Ok another update… and the bottom line is: yes Z-wave inclusion is terrible (at least in my system).

First: I now believe that the episode where most of my devices appeared as dead is unrelated. I’ve observed that if I change the logging level (e.g., between “verbose” and “debug”), then many of my devices die (it looks like the ones that communicate with the controller via mesh rather than directly) and I start getting intermittent errors about the controller being unable to transmit. This seems to work itself out in a couple of hours to a couple of days (though every time this has happened, 1 or 2 nodes fail to reconnect and remain dead and need to be replaced). (I didn’t make the connection between changing logging level and losing nodes until it happened a second time.) This is deeply concerning but deserves a separate thread, which I will start shortly.

Now back to the challenges with inclusion: this is still miserably poor, and generally still follows the pattern described in my first post. “Smart Start” devices maybe connect after several days of trying (but often don’t even after that), and usually generate a number of “fake” nodes in the process that linger as dead and can’t be removed without a lot of collateral damage. It seems like what is happening is that the devices time out when trying to include (resulting in unsuccessful inclusion and a long wait) or time out when configuring S2 security (resulting in a “fake node” and then starting over again). Occasionally manual inclusion works, though frequently it includes without security (despite selecting “Force security” (!?)), creating a node that must be excluded… then I have to start over again.

And Lately, when I try to exclude these security-less nodes, legitimate unrelated nodes get removed too(!). For example I am trying to include a smart lightswitch. I ran manual inclusion but it timed out when trying to set up S2 security, resulting in a node included without security. I entered Exclusion mode in Z-wave JS UI, put the switch into exclusion mode… and then got a note that a completely unrelated smart plug had been successfully removed. (I was nowhere near the plug and it was definitely not in exclusion mode.)

I have invested so much in this rubbish-looking technology that I would really like to get it working. But I simply don’t see a path forward.

I see zero indication that I have a misbehaving node that is spamming the network: communication with non-dead nodes seems very reliable and the debug logs do not show anything suggesting that any node is transmitting too much. My only hope left is reconnecting the Z-wave dongle via a powered USB hub (in case it was being under-powered by the Home Assistant Green’s USB port), and that should be arriving later today.

Any help would be very appreciated!

Thanks!

I replied to your linked thread.

Honestly, the experience you document is so foreign to any Z-wave issues I’ve experienced I’m not sure where to even start than to point out that “the controller being unable to transmit” is NOT a normal Z-wave experience.

Also, I don’t understand in any way how “inclusion” could occur after a couple of hours to a couple of days. I simply don’t believe the inclusion process will run for that long. The only thing I can imagine is that communications is so poor that it takes that long for the interview process to complete and therefore update the node list.

I’ve never had excluding a single node remove multiple nodes.

Maybe open an issue on the Z-wave-JS UI GitHub and see if anyone there can help identify your unique gremlin.

Thanks @mterry63 it really seems unlike anything I’d expect from a mature technology so I’ve been assuming I just have made some boneheaded configuration error or something… but so far nothing has turned up.

My comment about inclusion happening after a couple days was if I go the “Smart Start” (i.e., QR code) route. It’s fairly common that if I try to include a new device by going (in Z-Wave JS UI) to inclusion, choosing “Scan QR code” and entering the QR code, and then going to the provisioning entries, checking only “S2 Authenticated” and then turning on “Active”… then over the next several days I’ll get a number (usually 1-4/day) artificial fake nodes appear (always without security) and then immediately go dead. (Sometimes they disappear from Z-wave JS UI shortly afterwards, but usually they linger indefinitely.) I usually get a message in Z-Wave JS UI to the effect that “node ### has been included with security none”, sometimes also mentioning either an “unknown error” or a timeout.

Sometimes these “fake nodes” remain listed as “Unknown Manufacturer” / “Unknown product” (and forever show “ProtocolInfo” with a spinning wheel in the interview column of the Z-Wave JS UI table), but other times they show the right product/manufacturer/FW version but never get a node name or location (and forever show “NodeInfo” and a spinning wheel in the Interview column of the Z-Wave JS UI table). Then sometimes after a couple of days the “real” node happily appears. Other times it never does…

(Why my insistence on S2 security? I’ve got a number of existing nodes that were included with S2 security and I’d like to make associations. My understanding is that it’s not possible to make associations between devices using different security levels.)

Sounds more like the interview process is taking an inordinately long time. Some complex devices have a good bit of information to exchange during the interview process, but typically I see this complete in a few seconds.

Have you ever tried to update a nodes firmware? Success or failure of that would go a long way to indicating the network health.

I don’t think S2 security has anything to do with the root problem, it’s just highlighting/aggravating the symptoms. But as I said earlier, your experience is so out of whack with mine, I’m no expert in solving your problem. That’s why I recommended the GitHub route.

Oh interesting. Now that you mention it, I have had a lot of challenges updating node firmware (despite a number of attempts). Hmmm… What does that imply (other than that there are lots of problems :confused: )?

(FWIW the devices I’m trying to add are relatively straightforward, mostly Zooz 800 series switches (Zen71, Zen30, Zen32), Aeotec Multisensor 7’s, or Swidget inserts.)

Inside of Z-Wave JS UI go to the node map > click on a node > then click diagnose to perform a health check. After it’s done it will give you a bunch of information about your device communication. The most important is SNR. If you have a bunch of interference your device will have a negative SNR.

I recommend you try this on multiple devices to get a good idea of what’s going on in your network.

2 Likes