Why is Z-wave inclusion so @#$* hard?

The last few devices I’ve tried to add have been a similar nightmare to OP. Constant retries of inclusion/exclusion. I’ll often get nodes that will finally include, but then take a few days for all of the device entities to show up.

@Onkage I’m (apologetically) glad to see that I’m not alone in this issue… This wasn’t how things started out but as I added more nodes things got really bad.

Here’s an update:

First, @cornellrwilliams I’ve run the “diagnose” command a number of times and have no idea how to interpret what I’m seeing:

Here’s a node 12’ away from my controller (unobstructed line-of-sight):

Here’s one 15’ away through a (wood stud + drywall) wall:

Here are 2 more:


The SNR’s are all positive. Many of these nodes are operational, but the score out of 10 does not inspire confidence… Is this good or bad? What are the takeaways?

More broadly, I’m continuing to see “unable to transmit” messages in my logs. And now it’s taking a long time (several minutes) after opening Z-wave JS UI to see any nodes listed (even my controller)… during which time the icon third-from-right at the top of Z-wave JS UI flashes back and forth between red and green (“disconnected” and “connected”).

So I took drastic steps and tried replacing my controller… and even my Home Assistant box while I was at it. So I’m using a brand new dongle (restored from backup), and a brand new Home Assistant host (HA Yellow rather than Green) based on an entirely different SOC (RPi), with different hardware… and I went ahead and moved the controller to a completely different part of the house – one with VERY little possible of RF interference…
… and I still have the same deeply unreliable z-wave behavior: new nodes do not include (timing out, especially on S2 authentication – across at least 3 different brands of devices), and OTA updates fail as well. And every time I change the logging level, all nodes disconnect and slowly reconnect over the course of a day or two and about 10-15% never reconnect, even after days (note: highly correlated with device type – Aoetec multisensor 7’s are the worst, Zooz ZAC38’s are 2nd worst, Zooz Zen32’s seem more ok, Zooz Zen71s are mixed.)

This is maddening and definitely isn’t giving me the warm fuzzies about Z-wave or any companies/people who promote it :frowning: . Even if I have some bizarrely weird situation going on (and I can’t imagine what the might be), the lack of diagnosability inherent in this technology is pretty unforgivable in a technology like this. (E.g., is my network saturated and if so, by which nodes? Is it a throughput issue? Is it all a controller issue? Is there background interference? etc.?)

I would LOVE to be proven totally wrong and have someone point out that like an idiot I, e.g., forgot to enable the “make z-wave network reliable” option in the 4th settings submenu but so far that hasn’t happened…

Bit of a stretch here… but do you have more than one controller stick plugged in and active at the same time? Pictures 1 and 4 above show “USB Dongle (General)”, while 2 and 3 show “USB Dongle (4 - Mike’s Office)”.

Good idea, but sadly not the case. (These diagnoses were done over several days and I tried relocating the controller during that time (in case there was background interference) and updated its location when I did…)

Without a spectrum analyzer this statement is wishful thinking at best and leading you down the wrong path at worst. Interference doesn’t have to originate from your house in the 900 MHz band. Your troubleshooting efforts to this point help make the case for interference or a lack of a clear channel. You could swap out every component and the result would be the same as the source of the problem is potentially external.

High latency and log errors about unable to transmit align with a congested channel. Z-wave radios can’t transmit at the same time, otherwise they the produce a “collision” which results in scrambled data. Another z-wave node or any transmitter at the same frequency that happens to occur at the same time will trigger collision detection and random back-off waiting to try again. This can occur over and over on a congested channel, resulting in a timeout of the overall stack.

The symptoms of congestion fit your descriptions. If the problem is external radio interference and not a jabbering node, there’s a chance you will never solve it.

@mterry63 Thanks. Fair point – and I really am not an expert on RF analysis!

That said, the new location I moved the controller to is a basement surrounded on 3 sides by thick earth and underneath a concrete slab. Definitely not a Faraday cage, but if the culprit really was background interference then I’d naively expect to see a different failure pattern (e.g., problems with nodes farther out in the main house but better behavior with other nodes in the basement, and hopefully less “failure to transmit” errors) which doesn’t seem to be the case.

I’d be happy to get a simple 900 MHz spectrum analyzer from Amazon (and it even looks like Z-Wave JS UI shows some information on the background if you click on the controller node in network view), but I don’t really know what I’m looking for – e.g., what is “normal” vs problematic readings… Any guidance?

Thank you again for the suggestions!

I’d repeat my guidance to open an issue on Github or ask for help on discord. Buying a scalpel on Amazon won’t make you a surgeon. You need the tool, the knowledge, and the experience. It’s hard to short-circuit that 3rd requirement. :slight_smile:

Old computer guy here, with some RF experience. There’s another phenomena in RF called “desensing” which can cause all sorts of reliability problems. Basically some kind of “transmitter”, even working at a frequency far away from the one you’re using, can “desense” a receiver that is physically close by, so that the receiver can’t hear anything at all while that transmitter is active.

RF Energy falls away rapidly with distance, but often people try to be “neat” and, for example, put all of their “equipment” in the same closet or shelf, where they are very close together. A transmitter can effectively block all signals that a receiver can otherwide receive just fine when that transmitter is not transmitting.

So many devices now use radio. In addition to the 900MHz Zwave interactions, bluetooth, wifi, “baby monitors”, cameras, appliances, etc., etc. can be occasionally transmitting on some frequency even though you’re not really using them or don’t know that they’re using radio.

Desensing isn’t limited to “radio” transmitters. Even power lines generate RF energy, depending on whether or not current is flowing. A Zwave device on a wall but with a power line hiding just behind the drywall might be affected when that power circuit is in use.

I still remember one nasty “reliability” problem I struggled with for weeks with a computer system I was building. Sometimes it worked, sometimes not. It was only when I finally noticed that it worked, or not, depending on whether or not the room lights were on! That was the clue that led to discovering that there was a defective fluorescent tube in the lab’s ceiling lights, sending out a strong RF signal at something like 40 kilohertz IIRC - which was enough to cause the computer problem.

In another case, I finally noticed that a problem was occurring when a plane flew overhead at a certain time of day. I never found out for sure, but I suspected some particular flight had a plane with a radar issue that caused the problem whenever it was arriving for its daily flight. Grounding everything as I should have in the first place made that problem go away.

So just FYI - a 900MHz spectrum analyzer might find the problem, but if not you might think about other RF sources. Of course the problem might be at either end of a 2-way ZWave conversation, so you have to look at the various devices along the ZWave route in addition to the controller itself. One debugging technique is to turn off things (power them down) in the area, even though they’re not “900 MHz devices”, and see if the problems disappear.

Good luck!

One more possibility I forgot to mention. USB3 is a known source of problems as well, even though it’s not “radio”. See for example: USB3.0 Radio Frequency Interference

LANs also now run at such high speeds that the wires can act as antennas and they become RF transmitters. A “1GHz Ethernet” is a 1GHz transmitter.

Your Ethernet and USB cables can be sources of RF interference, even though they don’t look like radio devices.

Good RF engineering practice motivates keeping equipment and wiring well separated and appropriately shielded and/or grounded. Good housekeeping motivates putting all those ugly boxes and wires in one place and out of sight. Often you can’t have both…

  1. If you click the question mark it explains the scoring system. Also everything is color coded so green is good and red is bad. For example:
    If you have 1 failed ping the highest score you will get is a 3.
    If you have a latency higher than 1000ms the highest score you can get is a 2.
    If you have 2 or less neighbors the highest score you can get is a 6.
    If your SNR is great than or equal to 17dbm you will get an 8 or higher. The higher your SNR is the better your score will be.
    If your minimum power level w/o errors is -6dbm then you will get an 8 or higher. I believe the power level goes from -10dbm to +20dbm? At least that is what Z-Wave JS UI will allow me to set -10dbm is the lowest and +20dbm is the highest. Pretty much the more power you use the lower your score will be.
    according to my report normal power is equal to 0dbm which is the max amount of power you can use in mesh mode.
    Pretty much the takeaway is that your device is using more power to transmit and still having trouble.
1 Like

Mike15,

I am having identical issues. It got really bad once I got over 50 nodes on the network. My health check results are near identical to yours.

I can have two switches in the same gang box with one that does well (>7/10) on health check and then the other will always get 0/10 with the 10/10 return ping fails.

I would think that RF issues would impact all nodes in the same way.

Inclusions are damn near impossible.

I used the silicon labs PC controller software to upgrade the firmware on all the nodes except for my ZST39 stick which is still running ver. 1.1 per directions from ZOOZ support. I am hearing nightmare scenarios with the latest 1.3 release so I will not be upgrading for now.

The last comms with ZOOZ tech support included the following:

“We are putting a lot of pressure on SiliconLabs to work through any remaining issues.”

1 Like

Thanks everyone. @EPStutes It’s sort of reassuring to hear I’m not the only one… While on the one hand our symptoms sure sounds like background RF-like (failure to complete interviews for some nodes, failure of OTA updates, etc.), I agree that it’s suspicious that the behavior is wildly different for 2 devices in the same gang box (I have this too) and is unchanged when I relocate my controller to a location with a quite different RF profile. The fact that you see something similar makes me think this isn’t background RF (at least not primarily background RF).

@cornellrwilliams Thanks for the guidance on interpreting the Diagnose Node. Any thoughts, though, on what’s “normal”? (I.e., does 10/10 only happen in the same perfect conditions that give us a 1.5 mile range for Z-wave LR or is anything lower than that cause for alarm?)

@mterry63 thanks as well. I do acknowledge that I will never be a decades-experienced RF engineer with a perfect grasp of the possible exceptions and nuances. But I would appreciate any practical advice on what I could do given my limitations? I’m not trying to be a surgeon by buying a scalpel on Amazon… but I was hoping this might be more like buying a thermometer to take my temperature before calling my doctor about “flu-like” symptoms…

I will look at posting on GitHub. (I don’t have an account there and am not sure whether this is a Z-Wave JS UI issue or something more fundamental… so not exactly sure where on GitHub to post… or if this is now a Zooz or Silicon Labs issue.)

As one more troubleshooting step: is there any practical way to transfer NVM to a new controller that is not the same make/model as my current one? (I did try switching to a Zooz ZAC93 (before getting a new ZST39) but the “restore” of my NVM backup (from within Z-Wave JS UI) did not work because the devices were not the same.) I would be happy to buy and try a completely different controller (in case this is a Zooz-working-with-SiliconLabs issue as @EPStutes posts suggests) but I don’t want to have to remove/readopt/reconfigure all of my devices to do so…

  1. A 0-3 is bad, 4-6 is better, and 7-10 is the best.
  2. Does your ZAC93 have FW 7.19.3 or newer.

Mike,

I just received a brand new ZST39 which shipped with the 1.3 version of the firmware

I am running it on a laptop (PC) using the latest Z-WAVE JS UI exe version 9.14.4 and have my HA (running on a NUC) linked to it using the websocket. This lets me add another instance of the Zwave JS UI to the HA for testing.

See https://youtu.be/KZc2KLQcg40?si=dO4KSoTjH5iwWsVD for instructions how to do this.

I swapped over 10 non-critical nodes around the house from stick with the
1.1 firmware on my HA NUC to the stick with the 1.3 firmware on my laptop. To be clear, I excluded them from one network and then included them in the other, building the new network from scratch.

So far, since the morning of 6/28, the 1.3 controller has locked up only once, but that may have because my laptop went to sleep. Also, any node that makes more than 2 hops still returns real poor health checks just like before with the 0/10 return pings. Moving around with the laptop definitely helps improve things by reducing the number of hops.

It’s real disappointing that Z-wave cannot cover on a house with 120 nodes all within 70 ft of the hub and no more than 15ft between nodes. I hope this is still caused by bugs in the controller firmware that will eventually be resolved.

For now, my plan is to split the nodes by location using second NUC HA hub and link the two hubs in the same way I am doing with the laptop. I am also trying out using USB to cat5 adapters to allow me to locate the USB stick in a central location in my attic away from my NUC.

Amazon.com: Monoprice USB Extender over CAT5E or CAT6 Connection up to 150ft : Electronics

I have not decided yet if I will keep the second NUC as a windows machine or change it over to the HAOS.

Well, I tried updating my controller firmware to v1.3 and of course that bricked my ZST39. I guess I shouldn’t be surprised. :frowning:

I downloaded the gbl file from Zooz and ran OTW update from Z-Wave JS UI. I quickly got a message that the update failed because retry limit had been reached and now my controller is dead with the notice “Driver failed to recover from bootloader. Please flash a new firmware to continue”. I tried rebooting and tried to flash again but that failed because the controller is “not ready”. I’m hesitant to do a factory reset of the controller because that warns that all devices will be un-paired (I would hope I could restore NVM… but (a) I don’t know if an NVM restore will recover from that and (b) I don’t know if it will be possible to restore a 1.2 NVM backup to an updated-to-1.3 device). Ugh. I do not recommend z-wave to anyone. I wish I could go back and have chosen wifi-based devices instead, or maybe zigbee, and not put all this time and money into Z-wave… :frowning: This is all super frustrating! I wish my z-wave experience were more like some of the positive experiences some others apparently seem to be having!

I guess my next step is to try to restore my NVM backup onto a spare ZST39. And then maybe getting and using the Simplicity Studio PC controller I can successfully re-flash this ZST39 (to use as a backup the next time I need one…)

How frustrating…

I have the same sorts of issues with Zwave inclusion, esp wrt locks. My Yale locks Zwave modules won’t seem to pair unless the controller is nearby, which is very frustrating. I have a 700 stick too, connected by a 3 ft USB extension cable. And the network routing is weird (like right now I’m routing to a Fibaro flood sensor through a node that is reportedly dead). And I have no less than 3 range extenders and still can’t get reliable connections.

To be fair, it may not have to be like this. I came from Homeseer where the Zwave inclusion and network routing were quite reliable, though that was back in the 500 stick days and older Zwave devices.

This was such a problem for me that I eventually gave up and moved almost all my devices to Zigbee, and at least with zigbee2mqtt, inclusion and network reachability are both easy and stable. It just wasn’t worth the grief, and running HA was more important to me than sticking with Homeseer and an a reliable network, hence the move to go to zigbee.

I have a couple of the locks running with zigbee sticks, which have less support in HA, but work reliably. The modules are expensive so I look for used ones on ebay and snatch them when at a reasonable price. Everything else is cheaper with zigbee and works very reliably, and usually with much faster response. You can also get these SLZB-06M Zigbee Ethernet PoE LAN USB WiFi Adapter EFR32 | Zigbee2MQTT | Home Assistant | SMLIGHT | SMLIGHT Official Homepage type sticks that can be POE powered and put in the best spot topologically, and connect to zigbee2mqtt (or ZHA if you really want to go that route) remotely over the network. No long USB extension cable needed. Nothing like that exists for Zwave.

I would love to see Zwave be reliable in HA, but at least for me after 2 years after going to HA from Homeseer, it’s still painful for the devices I have left.

I’d recommend you seek support through Zooz. I understand they are quite responsive and are a family run business that wants their products to work for their customers. You will likely get better support than this forum since they sold the product.

I would hope you would say something like “My experience wasn’t great, but perhaps you’ll have better luck.” Keep in mind there is a world of satisfied Z-wave customers out there. Make your statistics teacher proud. :slight_smile:

@mterry63 Thanks. I will reach out to Zooz for advice.

And fair point about your comment on tone. Edited accordingly and thanks for the reminder. (Stupid statistical rigor getting in the way of my grumbling :slight_smile: )

1 Like

Ok. I spoke with Zooz support. They were very professional but said that there is a known issue with Silicon Labs SDK 7.19 that affects a portion of setups and exhibits symptoms like what I describe. They claimed that the issue is independent of controller brand (and would affect Aeotec or Homeseer controllers the same way)… and that the main path to resolution is… hoping that eventually Silicon Labs fixes it.

They recommended not upgrading to controller firmware 1.3, as it causes other issues (and said that they may be removing it from their website)… they said I may get some benefit from downgrading to firmware 1.1. They suggested I not do this through home assistant, but instead try to get PC Controller from Silicon Labs. (So far I have been unable to do this.)

So it seems that the bottom line is: there is a known issue in the current version of Z-wave SDK that affects some users’ installations (unclear which) and causes roughly the problems I have described… and we do not currently know when Silicon Labs will issue a fix (no timeline/ETA/bug #, etc. was available).

FYI @EPStutes @fresnoboy , since it sounds like you may be in the same boat

1 Like

Ugh… This problem has been going on for quite awhile - why haven’t they gotten a fix out yet?

This is one of the problems that coms from being reliant on a single chip provider for the whole ecosystem. We can’t swap out the controller with a chip made by someone else. There are many reasons why monopolies are to be avoided.

I’m glad I moved to Zigbee where we have lots of choices.