Would Z-Wave Plus help my performance?

anderson110 · January 6, 2018, 2:22am

Things get more “interesting” when you start trying to do more than just one command - which by itself is instantaneous.

Just trying to issue an on and then an off 1 sec later doesn’t work - it seems like seconds need to pass before you can issue an off after issuing an on. There is probably some handshake in the protocol here that I’m not respecting, and it has to time out waiting for something before accepting a new command initiation. (I’m guessing here - I’m not exactly sure what’s going on yet.)

Trying to send two “turn light on” commands to different nodes in quick succession also doesn’t seem to work. The first node responds, the second one does not. (Again, this is probably because it’s waiting for more follow-up from the first command before accepting new command initiation. But I need to look more carefully at the protocol.)

I will dig a little deeper on the protocol and see if I can get to the bottom of this.

anderson110 · January 6, 2018, 7:03am

Ok, I got the basic protocol message sequence sorted. Every switching message needs to recv an ack, then a response, then send an ack for the response. Once you do that, the controller is ready to go for the next command without delay. Of course, it’s not quite that simple, as there can be other unsolicited messages coming in, or the command can fail to be acknowledged for reasons that are unclear to me so far. But if these complications don’t arise, that’s a sufficient sequence.

So I tried turning on 2 lights in sequence using this protocol sequence. It is fast. You can perceive the order they turn on, but not by much. But here’s the catch… it is lighting fast like this when the command sequence goes off without a hitch. When it doesn’t (which is pretty common), then I’m not sure what the optimal handling is, and how fast it would be. I see code in openzwave which says things like “wait around for 500 ms to see if we get what we’re expecting before giving up”. I suspect it is this kind of condition which causes the slow zwave performance we typically see.

I’m not giving up yet… I think there’s more to be learned here about how to structure robust communication that will be fast even in the presence of “errors”.

BendedArrow · January 6, 2018, 8:24am

When you first start up OpenZwave/HA, there is a 5 to 10 minute period when it is obtaining capabilities of devices and that takes some time, so messages to turn devices on and off will be delayed.
I would say, tail the OpenZwave log when you are switching things on and off and see if messages are being sent (actually getting turned on and off) or queued (waiting to be sent). Check your node statistics for averageRequestRTT and averageResponseRTT to see how much time it is taking for your node to respond. Also check sentCNT and sentFailed for problem nodes to see if individual devices have an issue. You could have a device that is out of range that is causing problems.

Someone mentioned that adding secure nodes will slow down OpenZwave. You can try re-adding devices that don’t need to be secure in a regular manner to see if it helps.

Also, I don’t really recommend this, but I have a script that restarts Zwave every day at a specific time and things seem more snappy for me, but it might not exactly help you.

- alias: Restart Zwave Every Day at 2pm
  trigger: 
    platform: time
    at: "14:00:00"  
  action:
    - service: zwave.stop_network
    - delay: '00:01:00'
    - service: zwave.start_network

Again, these are not solutions necessarily, but things to try.

anderson110 · January 6, 2018, 8:23pm

I followed up a little more with this today. I wrote a python script which is a little smarter about handling incoming messages, and sequencing things with the controller. My script turns on two lights one after the other as quickly as possible while respecting the handshake sequence, and it is damn fast. Close to simultaneous. It is also rock solid, handling other incoming unsolicited messages without any issues. I ran it for an hour or so and it never “got lost” with the serial message streams. No delays at all.

So I’m definitely thinking there are some software issues here limiting the responsiveness of zwave in HA. I’m not entirely sure if the issues are more in HA or more in openzwave, but I suspect the latter.

(Part of the problem here is that while some of the Zwave docs have been opened, the API for communicating with a z-wave stick has not, making it a reverse engineering chore to deal with. Sigma Designs seems to have no intentions of opening up this piece of the standard.)

I’m not sure what the next step is here. Trying to profile all of this within HA, probably, to see where the slippage is, but that’s a pretty daunting task.

rogersmj · January 8, 2018, 2:18pm

I wish I could help but this is getting out of my depth!

Maybe we could ask @balloob who might be the best candidate to help you dig into this. A lot of people use Z-Wave, and I see a lot of anecdotal griping about responsiveness but I haven’t heard of a concerted effort to address it. Your findings, @anderson110, suggest that significant improvements might be possible.

jwelter · January 8, 2018, 5:24pm

This is why Sigma doesn’t open it up – they want to license it to you.

https://z-wave.sigmadesigns.com/design-z-wave/embedded-development-kits/

OpenZwave is a reverse implementation of it. I suspect if your python code was expanded to deal with all the corner cases and exceptions and ill behaved devices it would eventually become as complex as OZW.

Perhaps the way forward is a thin zwave replacement library that only deals with a limited set of mainstream devices with the condition if you have a device that doesn’t work you can reconfigure to use the OZW stack.

I do realize it’s one or the other.

anderson110 · January 8, 2018, 6:30pm

I thought about this a little more.

I think the next step is to re-implement my python test through the python-openzwave interface. If the performance is significantly degraded, we’ve bracketed the issue. If the performance is comparable to my hand-coded barebones script, then we’ve identified the problem as occurring outside of these two layers (openzwave and python-openzwave). (Probably, we’ll see something a little degraded, making the situation less clear).

If I can find some time I will attempt this. I’m dealing with some more urgent issues with my setup at the moment, however.

anderson110 · January 8, 2018, 6:51pm

I will also say I’ve tossed the idea around in my head of setting up a test rig with a pi and a photosensor to take real data on my network, measuring elapsed time from cmd sent to light being detected by photosensor, which is the real bottom line here. This would require a very long USB cable to keep the stick in a fixed location as the pi moves around - not sure if that could introduce any issues. Ideally this could help quantify and document real-world performance, at least on one system, and measure the effect of any optimizations that might be tried. This is probably not happening in the next few weeks, however. Something to put on the “wish I had time” list.

jwelter · January 8, 2018, 7:43pm

You could use a dry contact as when it responds would be the network latency from sending the command?

Eg: command sent is T0, dry contact responding is T1.

athome · January 10, 2018, 12:15am

Here is a wack-on-the-side-of-the-head idea: the performance symptoms have nothing to do with z-wave.

Mine is ubuntu 16.x on an intel machine, Vera plus, mostly GE 14294, a couple of 14291, some Aeotc motion sensors, some other odds and ends including happybubbles ble sensors.

When I added Alexa and the beta cloud, I noticed a significant degradation in performance.

When I was doing the mqtt stuff late last year, I became aware of the fact that there is a pile of logging and message activity that is obfuscated.

Now that I have material lag between opening the back door and the outside lights turning on, I am very suspicious that HA isn’t efficient somewhere and have no idea where to start to find the smoking gun.

I don’t think it is related to my z-wave stack.

anderson110 · January 10, 2018, 12:19am

Well, that is consistent with the experimentation so far.

The python-openzwave test I proposed above would help confirm your suspicion.

Perhaps some profiling might turn up some information… hmm.

jwelter · January 10, 2018, 12:40am

Is your zwave bridge the Vera? That removes OpenZWave completely…