Why is Z-wave inclusion so @#$* hard?

Good idea, but sadly not the case. (These diagnoses were done over several days and I tried relocating the controller during that time (in case there was background interference) and updated its location when I did…)

Without a spectrum analyzer this statement is wishful thinking at best and leading you down the wrong path at worst. Interference doesn’t have to originate from your house in the 900 MHz band. Your troubleshooting efforts to this point help make the case for interference or a lack of a clear channel. You could swap out every component and the result would be the same as the source of the problem is potentially external.

High latency and log errors about unable to transmit align with a congested channel. Z-wave radios can’t transmit at the same time, otherwise they the produce a “collision” which results in scrambled data. Another z-wave node or any transmitter at the same frequency that happens to occur at the same time will trigger collision detection and random back-off waiting to try again. This can occur over and over on a congested channel, resulting in a timeout of the overall stack.

The symptoms of congestion fit your descriptions. If the problem is external radio interference and not a jabbering node, there’s a chance you will never solve it.

@mterry63 Thanks. Fair point – and I really am not an expert on RF analysis!

That said, the new location I moved the controller to is a basement surrounded on 3 sides by thick earth and underneath a concrete slab. Definitely not a Faraday cage, but if the culprit really was background interference then I’d naively expect to see a different failure pattern (e.g., problems with nodes farther out in the main house but better behavior with other nodes in the basement, and hopefully less “failure to transmit” errors) which doesn’t seem to be the case.

I’d be happy to get a simple 900 MHz spectrum analyzer from Amazon (and it even looks like Z-Wave JS UI shows some information on the background if you click on the controller node in network view), but I don’t really know what I’m looking for – e.g., what is “normal” vs problematic readings… Any guidance?

Thank you again for the suggestions!

I’d repeat my guidance to open an issue on Github or ask for help on discord. Buying a scalpel on Amazon won’t make you a surgeon. You need the tool, the knowledge, and the experience. It’s hard to short-circuit that 3rd requirement. :slight_smile:

Old computer guy here, with some RF experience. There’s another phenomena in RF called “desensing” which can cause all sorts of reliability problems. Basically some kind of “transmitter”, even working at a frequency far away from the one you’re using, can “desense” a receiver that is physically close by, so that the receiver can’t hear anything at all while that transmitter is active.

RF Energy falls away rapidly with distance, but often people try to be “neat” and, for example, put all of their “equipment” in the same closet or shelf, where they are very close together. A transmitter can effectively block all signals that a receiver can otherwide receive just fine when that transmitter is not transmitting.

So many devices now use radio. In addition to the 900MHz Zwave interactions, bluetooth, wifi, “baby monitors”, cameras, appliances, etc., etc. can be occasionally transmitting on some frequency even though you’re not really using them or don’t know that they’re using radio.

Desensing isn’t limited to “radio” transmitters. Even power lines generate RF energy, depending on whether or not current is flowing. A Zwave device on a wall but with a power line hiding just behind the drywall might be affected when that power circuit is in use.

I still remember one nasty “reliability” problem I struggled with for weeks with a computer system I was building. Sometimes it worked, sometimes not. It was only when I finally noticed that it worked, or not, depending on whether or not the room lights were on! That was the clue that led to discovering that there was a defective fluorescent tube in the lab’s ceiling lights, sending out a strong RF signal at something like 40 kilohertz IIRC - which was enough to cause the computer problem.

In another case, I finally noticed that a problem was occurring when a plane flew overhead at a certain time of day. I never found out for sure, but I suspected some particular flight had a plane with a radar issue that caused the problem whenever it was arriving for its daily flight. Grounding everything as I should have in the first place made that problem go away.

So just FYI - a 900MHz spectrum analyzer might find the problem, but if not you might think about other RF sources. Of course the problem might be at either end of a 2-way ZWave conversation, so you have to look at the various devices along the ZWave route in addition to the controller itself. One debugging technique is to turn off things (power them down) in the area, even though they’re not “900 MHz devices”, and see if the problems disappear.

Good luck!

One more possibility I forgot to mention. USB3 is a known source of problems as well, even though it’s not “radio”. See for example: USB3.0 Radio Frequency Interference

LANs also now run at such high speeds that the wires can act as antennas and they become RF transmitters. A “1GHz Ethernet” is a 1GHz transmitter.

Your Ethernet and USB cables can be sources of RF interference, even though they don’t look like radio devices.

Good RF engineering practice motivates keeping equipment and wiring well separated and appropriately shielded and/or grounded. Good housekeeping motivates putting all those ugly boxes and wires in one place and out of sight. Often you can’t have both…

  1. If you click the question mark it explains the scoring system. Also everything is color coded so green is good and red is bad. For example:
    If you have 1 failed ping the highest score you will get is a 3.
    If you have a latency higher than 1000ms the highest score you can get is a 2.
    If you have 2 or less neighbors the highest score you can get is a 6.
    If your SNR is great than or equal to 17dbm you will get an 8 or higher. The higher your SNR is the better your score will be.
    If your minimum power level w/o errors is -6dbm then you will get an 8 or higher. I believe the power level goes from -10dbm to +20dbm? At least that is what Z-Wave JS UI will allow me to set -10dbm is the lowest and +20dbm is the highest. Pretty much the more power you use the lower your score will be.
    according to my report normal power is equal to 0dbm which is the max amount of power you can use in mesh mode.
    Pretty much the takeaway is that your device is using more power to transmit and still having trouble.
1 Like

Mike15,

I am having identical issues. It got really bad once I got over 50 nodes on the network. My health check results are near identical to yours.

I can have two switches in the same gang box with one that does well (>7/10) on health check and then the other will always get 0/10 with the 10/10 return ping fails.

I would think that RF issues would impact all nodes in the same way.

Inclusions are damn near impossible.

I used the silicon labs PC controller software to upgrade the firmware on all the nodes except for my ZST39 stick which is still running ver. 1.1 per directions from ZOOZ support. I am hearing nightmare scenarios with the latest 1.3 release so I will not be upgrading for now.

The last comms with ZOOZ tech support included the following:

“We are putting a lot of pressure on SiliconLabs to work through any remaining issues.”

1 Like

Thanks everyone. @EPStutes It’s sort of reassuring to hear I’m not the only one… While on the one hand our symptoms sure sounds like background RF-like (failure to complete interviews for some nodes, failure of OTA updates, etc.), I agree that it’s suspicious that the behavior is wildly different for 2 devices in the same gang box (I have this too) and is unchanged when I relocate my controller to a location with a quite different RF profile. The fact that you see something similar makes me think this isn’t background RF (at least not primarily background RF).

@cornellrwilliams Thanks for the guidance on interpreting the Diagnose Node. Any thoughts, though, on what’s “normal”? (I.e., does 10/10 only happen in the same perfect conditions that give us a 1.5 mile range for Z-wave LR or is anything lower than that cause for alarm?)

@mterry63 thanks as well. I do acknowledge that I will never be a decades-experienced RF engineer with a perfect grasp of the possible exceptions and nuances. But I would appreciate any practical advice on what I could do given my limitations? I’m not trying to be a surgeon by buying a scalpel on Amazon… but I was hoping this might be more like buying a thermometer to take my temperature before calling my doctor about “flu-like” symptoms…

I will look at posting on GitHub. (I don’t have an account there and am not sure whether this is a Z-Wave JS UI issue or something more fundamental… so not exactly sure where on GitHub to post… or if this is now a Zooz or Silicon Labs issue.)

As one more troubleshooting step: is there any practical way to transfer NVM to a new controller that is not the same make/model as my current one? (I did try switching to a Zooz ZAC93 (before getting a new ZST39) but the “restore” of my NVM backup (from within Z-Wave JS UI) did not work because the devices were not the same.) I would be happy to buy and try a completely different controller (in case this is a Zooz-working-with-SiliconLabs issue as @EPStutes posts suggests) but I don’t want to have to remove/readopt/reconfigure all of my devices to do so…

  1. A 0-3 is bad, 4-6 is better, and 7-10 is the best.
  2. Does your ZAC93 have FW 7.19.3 or newer.

Mike,

I just received a brand new ZST39 which shipped with the 1.3 version of the firmware

I am running it on a laptop (PC) using the latest Z-WAVE JS UI exe version 9.14.4 and have my HA (running on a NUC) linked to it using the websocket. This lets me add another instance of the Zwave JS UI to the HA for testing.

See https://youtu.be/KZc2KLQcg40?si=dO4KSoTjH5iwWsVD for instructions how to do this.

I swapped over 10 non-critical nodes around the house from stick with the
1.1 firmware on my HA NUC to the stick with the 1.3 firmware on my laptop. To be clear, I excluded them from one network and then included them in the other, building the new network from scratch.

So far, since the morning of 6/28, the 1.3 controller has locked up only once, but that may have because my laptop went to sleep. Also, any node that makes more than 2 hops still returns real poor health checks just like before with the 0/10 return pings. Moving around with the laptop definitely helps improve things by reducing the number of hops.

It’s real disappointing that Z-wave cannot cover on a house with 120 nodes all within 70 ft of the hub and no more than 15ft between nodes. I hope this is still caused by bugs in the controller firmware that will eventually be resolved.

For now, my plan is to split the nodes by location using second NUC HA hub and link the two hubs in the same way I am doing with the laptop. I am also trying out using USB to cat5 adapters to allow me to locate the USB stick in a central location in my attic away from my NUC.

Amazon.com: Monoprice USB Extender over CAT5E or CAT6 Connection up to 150ft : Electronics

I have not decided yet if I will keep the second NUC as a windows machine or change it over to the HAOS.

Well, I tried updating my controller firmware to v1.3 and of course that bricked my ZST39. I guess I shouldn’t be surprised. :frowning:

I downloaded the gbl file from Zooz and ran OTW update from Z-Wave JS UI. I quickly got a message that the update failed because retry limit had been reached and now my controller is dead with the notice “Driver failed to recover from bootloader. Please flash a new firmware to continue”. I tried rebooting and tried to flash again but that failed because the controller is “not ready”. I’m hesitant to do a factory reset of the controller because that warns that all devices will be un-paired (I would hope I could restore NVM… but (a) I don’t know if an NVM restore will recover from that and (b) I don’t know if it will be possible to restore a 1.2 NVM backup to an updated-to-1.3 device). Ugh. I do not recommend z-wave to anyone. I wish I could go back and have chosen wifi-based devices instead, or maybe zigbee, and not put all this time and money into Z-wave… :frowning: This is all super frustrating! I wish my z-wave experience were more like some of the positive experiences some others apparently seem to be having!

I guess my next step is to try to restore my NVM backup onto a spare ZST39. And then maybe getting and using the Simplicity Studio PC controller I can successfully re-flash this ZST39 (to use as a backup the next time I need one…)

How frustrating…

I have the same sorts of issues with Zwave inclusion, esp wrt locks. My Yale locks Zwave modules won’t seem to pair unless the controller is nearby, which is very frustrating. I have a 700 stick too, connected by a 3 ft USB extension cable. And the network routing is weird (like right now I’m routing to a Fibaro flood sensor through a node that is reportedly dead). And I have no less than 3 range extenders and still can’t get reliable connections.

To be fair, it may not have to be like this. I came from Homeseer where the Zwave inclusion and network routing were quite reliable, though that was back in the 500 stick days and older Zwave devices.

This was such a problem for me that I eventually gave up and moved almost all my devices to Zigbee, and at least with zigbee2mqtt, inclusion and network reachability are both easy and stable. It just wasn’t worth the grief, and running HA was more important to me than sticking with Homeseer and an a reliable network, hence the move to go to zigbee.

I have a couple of the locks running with zigbee sticks, which have less support in HA, but work reliably. The modules are expensive so I look for used ones on ebay and snatch them when at a reasonable price. Everything else is cheaper with zigbee and works very reliably, and usually with much faster response. You can also get these SLZB-06M Zigbee Ethernet PoE LAN USB WiFi Adapter EFR32 | Zigbee2MQTT | Home Assistant | SMLIGHT | SMLIGHT Official Homepage type sticks that can be POE powered and put in the best spot topologically, and connect to zigbee2mqtt (or ZHA if you really want to go that route) remotely over the network. No long USB extension cable needed. Nothing like that exists for Zwave.

I would love to see Zwave be reliable in HA, but at least for me after 2 years after going to HA from Homeseer, it’s still painful for the devices I have left.

I’d recommend you seek support through Zooz. I understand they are quite responsive and are a family run business that wants their products to work for their customers. You will likely get better support than this forum since they sold the product.

I would hope you would say something like “My experience wasn’t great, but perhaps you’ll have better luck.” Keep in mind there is a world of satisfied Z-wave customers out there. Make your statistics teacher proud. :slight_smile:

@mterry63 Thanks. I will reach out to Zooz for advice.

And fair point about your comment on tone. Edited accordingly and thanks for the reminder. (Stupid statistical rigor getting in the way of my grumbling :slight_smile: )

1 Like

Ok. I spoke with Zooz support. They were very professional but said that there is a known issue with Silicon Labs SDK 7.19 that affects a portion of setups and exhibits symptoms like what I describe. They claimed that the issue is independent of controller brand (and would affect Aeotec or Homeseer controllers the same way)… and that the main path to resolution is… hoping that eventually Silicon Labs fixes it.

They recommended not upgrading to controller firmware 1.3, as it causes other issues (and said that they may be removing it from their website)… they said I may get some benefit from downgrading to firmware 1.1. They suggested I not do this through home assistant, but instead try to get PC Controller from Silicon Labs. (So far I have been unable to do this.)

So it seems that the bottom line is: there is a known issue in the current version of Z-wave SDK that affects some users’ installations (unclear which) and causes roughly the problems I have described… and we do not currently know when Silicon Labs will issue a fix (no timeline/ETA/bug #, etc. was available).

FYI @EPStutes @fresnoboy , since it sounds like you may be in the same boat

1 Like

Ugh… This problem has been going on for quite awhile - why haven’t they gotten a fix out yet?

This is one of the problems that coms from being reliant on a single chip provider for the whole ecosystem. We can’t swap out the controller with a chip made by someone else. There are many reasons why monopolies are to be avoided.

I’m glad I moved to Zigbee where we have lots of choices.

Also, just want to call out that:

  • Updating firmware OTW on the Zooz stick apparently requires using Silicon Labs’ PC Controller. (Updating via Home Assistant risks bricking your controller.)
  • To get PC Controller you need to go through a pretty significant and non-straightfoward registration/installation process for Simplicity Labs. The legal terms you must agree to for this are extensive and they require permission to “monitor” your use of the program.
  • At the end of all of this, you will be told that the controller is only available for Windows computers. (I currently don’t have one – at least not one that isn’t corporate-managed and forbids installing things like this.)

So even if Silicon Labs were to issue a fix, it’s not even clear how I would go about applying that fix. There are a TON of hidden catches here that I wish had been clearly called out alongside the many positive claims about Z-Wave’s capabilities…

2 Likes

I am having the same issues with inclusion using a ZAC93 800 series Long Range controller and 2x kwikset home connect 620 locks. The issues were present on controller firmware version 1.00, 1.10 and 1.20. no idea what the locks firmware version is as I can’t get it to connect to the controller, ever. So the zooz excuse of it being sdk 7.19.x’ fault is incorrect as it happens in 7.18 and lower.

Trying to look up detailed device specs on the zwave alliance products page doesn’t work. When I search kwikset it returns nothing (and takes a good 2 mins to load the page or execute a search). Trying to look up literature for how s2 works resulted me in nothing official, just troubleshooting pages from alarm companies.

This is my first attempt at anything z wave and I have to say, I am completely unimpressed. Controller firmware issues is not something I would expect to see in a controlled system like zwave. As an example, I have never had much issues with wifi/eth, zigbee and Bluetooth IoT devices. I am on day 5 of trying to add 1 lock to the zwave network without success.

I have excluded and factory reset the devices. Factory reset the controller and nuked and reinstalled zwavejs UI container, no bueno.

I can’t really debug the logs because I have no reference to what a proper s2 inclusion looks like and I can’t seem to find any literature on how S2 works. S0 also does not work. Without s0 or s2 encryption, the locks will not expose any lock/unlock entities to HASS.

So,I now have almost $400 worth of z wave locks that I can’t use and are doing nothing but frustrating me.

I understand that maybe my situation isn’t typical, but this is ridiculous. I left smart start running overnight and woke up to 20+ dead nodes with all of them having the same issue (no encryption, unknown product/manufacturer).

All my debug logs error out at the same point in the s2 negotiation, regardless of if it’s smart start or manual inclusion. “S2 security bootstrapping failed due to an elapsed timer”.

I really hope my zwave experience gets better because as of this moment, i don’t want to work with zwave as it is incredibly unreliable in my experiences with it thus far.

@baudneo I feel your pain :frowning:

My situation has generally stabilized with the following characteristics:

  1. Several Z-wave devices (mostly light switches, plus a few temperature/humidity/occupancy sensors) have simply never successfully included (despite being installed an inch or two from another device that did include…) and are basically paperweights at this point.
  2. Z-wave command latency is high and messages sometimes never arrive at all. If I click to turn on a light in a dashboard, it can take 30-60 sec to turn on… and sometimes it doesn’t turn on at all
  3. OTA firmware updates do not work (at least for the devices on my network that have available firmware updates – mostly Aotec multi-sensors).
  4. When I open Z-wave JS UI, it typically lists “no nodes found” for a couple of minutes before the nodes appear. I frequently see a red warning at the top that “controller is unable to transmit” (though it comes and goes), and the top-right “connected” icon often blinks between red and green …

My only hope at this point is that Zooz issues an update that fixes these bugs… which, in turn apparently depends on Silicon Labs issuing an update… which has no ETA or even any indication that it is being worked on/prioritized. And, even if a fix does come, I’m a little scared about how to update the dongle, given that the last attempt to do so via Z-Wave JS UI bricked the (previous) dongle (Zooz afterwards told me that I should instead use Silicon Labs’ software as Z-Wave JS UI’s is not reliable)… but Silicon Labs’ software apparently only runs on windows machines and I don’t have one in the house…

Definitely not feeling the warm fuzzies with Z-wave…