I’d like to share my experiences with Home Assistant (HA) and Openzwave (OZW). I recently moved, thought “it’s almost 2018, that home automation zwave stuff must be mature enough by now, let’s try it”. But boy, what a struggle it has been so far. Please allow me to rant a little bit about the experience, my goal is to use enough of the right keywords per issue so that others having similar troubles find this post and can maybe use it to their advantage. Some thing may not be accurate or 100% correct, but please bear with me. In the end I’d love to hear about any things I missed or other comments. I appreciate all the efforts that went into the relevant products, please treat everything as positive feedback, maybe mixed with a little frustration So, here goes.
First the very unnuanced summary:
- The Openzwave cache file ‘zwcfg_*.xml’ is used to name the zwave entities in Home Assistant. Why is a CACHE file being used for something so critical, I ask myself?
- Many timeouts in my zwave network due to ‘unsupported commands’ (did I mention it’s 2018 now and the switches I use were introduced in 2016?)
- Timeouts because of out-of-range issues (good luck distinguishing between the previous point and this one when you have both issues and are new to zwave!)
- Abismal performance when switching a few things simultaneously. Literally taking full minute before the final light comes on (this is with a subset of four lights…)
- Zwave plus security feels like too much of a hit on performance. It’s not usable with many devices operating (nearly) simultaneously.
- When switching off a dimmer, for some reason the communicated value is actively refreshed from the device multiple times. Unnecessary, because the dimmer communicates the correct values! Now knowing how easily the network can become congested this has to be eliminated. But what component is doing this. Home assistant? Openzwave? The python wrapper of openzwave?
- You want to enable debug mode for the openzwave library, good luck finding out how!
- Why is there ‘workaround’ code in a home assistant zwave component for specific zwave devices? Shouldn’t this be in Openzwave itself? That has al kinds of possibilities/extensions/code to handle these things.
Ad 1:
Initially I had things up and runnig quite quickly. Until one day, don’t remember what the exact trigger was or if there even was any, I couldn’t control my switches/dimmers anymore. The physical switches still worked, so it had to be something related to HA. Finally, after hours of getting nowhere, I found out that my entities changed names, or some received the same name. Don’t ask me how this happened, it just did. I found out that HA takes the names of all zwave entitites from the OZW cache file called ‘zwcfg_.xml’. This is the only place these names are administered. In an OZW cache file. (https://github.com/OpenZWave/open-zwave/wiki/Configuration, “Generally, users, and application developers should never modify this file, but sometimes it might be required to delete the file, so the library can refresh the network devices and configurations.” No wonder I didn’t find this quickly. I actually had to delete this file many times due to all other Zwave issues I was having. I did not expect any configuration to be done in a cache file. So, I ended up creating a script that renamed all nodes in the cache file, and fixed any duplicate names. So that I could delete that cache file when required, and quickly add all the defined names that my HA config was expecting. (If anyone wants to do something similar and needs some inspiration, here is my script: https://gist.github.com/jrkoiter/037acacc58a96753b2cc79dc7446a370)
Ad 2:
The OZW_Log.txt file is your friend. I noticed many timeout messages in the logs. Again many hours spent, eventually I stumbled upon the Log Analyzer on openzwave.net. It’s great! (They do advise you to delete the ozwcfg file btw…) I was told that timeout messages could be due to devices not responding to some commands. Because they advertise they support those commands, but not actually respond to them! Oh that really builds my confidence…oh well, there will be a firmware update for these things hopefully, as I was told zwave plus supports ‘over the air’ updates. (spoiler: no updates possible at all with openzwave, only with the proprietary conrollers like the fibaro thing, bummer! So whatever firmware version you get, you’re stuck with it.) Anyway, the timeouts I was seeing were connected to my Fibaro dual switch modules, specifically ‘commandclass COMMAND_CLASS_SWITCH_MULTILEVEL.’. I eventually learned that OZW has config files for many devices, to fix or work around these kind of quirks. In this case its in the ‘fgs223.xml’ file of the OZW distribution. But wait, it mentions command class 38 (which is COMMAND_CLASS_SWITCH_MULTILEVEL) as not supported, and has some config to work around that. But it’s not working, I guess?! OZW still tries to send this command to that device, I thought it was specifically on a refresh call. But I now knew my way around that OZW cache file a little bit, so I just removed the element regarding that command class for these devices, restarted, and gone were the timeouts! Oh and same story for command class 39 for these devices (COMMAND_CLASS_SWITCH_ALL). Not supported. But also not listed in that ‘fsg223.xml’ file, weird… Anyway, same fix: delete it myself from that OZW cache file and restart. And writing this workaround down somewhere so I can do it again when I need to recreate this cache file for some other reason…
Logging example of this issue:
2018-01-30 08:07:34.815 Info, Node034, Sending (Query) message (Callback ID=0x1c, Expected Reply=0x04) - SwitchAllCmd_Get (Node=34): 0x01, 0x09, 0x00, 0x13, 0x22, 0x02, 0x27, 0x02, 0x25, 0x1c, 0xd9
2018-01-30 08:07:34.820 Detail, Node034, Received: 0x01, 0x04, 0x01, 0x13, 0x01, 0xe8
2018-01-30 08:07:34.820 Detail, Node034, ZW_SEND_DATA delivered to Z-Wave stack
2018-01-30 08:07:34.835 Detail, Node034, Received: 0x01, 0x07, 0x00, 0x13, 0x1c, 0x00, 0x00, 0x02, 0xf5
2018-01-30 08:07:34.835 Detail, Node034, ZW_SEND_DATA Request with callback ID 0x1c received (expected 0x1c)
2018-01-30 08:07:34.835 Info, Node034, Request RTT 19 Average Request RTT 42
2018-01-30 08:07:34.835 Detail, Expected callbackId was received
2018-01-30 08:07:44.816 Error, Node034, ERROR: Dropping command, expected response not received after 1 attempt(s)
2018-01-30 08:07:44.817 Detail, Node034, Removing current message
2018-01-30 08:07:44.817 Detail, Node034, Notification: Notification - TimeOut
Ad 3:
The timeouts that were left had to do with devices being out of range, or with bad reception. They usually work, but not always! But OZW already gave up after 1 try. Didn’t even retry. So I ended up changing MAX_TRIES to 2 in Defs.h of OZW (yes I already reached the point of running my own compilation of OZW…). And building a better mesh of course also.
Ad 4:
With these things worked out I finally could spend some time actually doing what I wanted: automate a few things! But with only four lights switching on/off/dimming as a group, the performance was very very bad. Sometimes it was ok (lights reacting quickly after each other), but most of the times some didn’t react at all, or with very long time in between. Ok, back to the OZW_Log.txt file again. First thing to notice: a lot of chatter to switch just four lights!! I must be able to bring that down and thereby solve the issues, or make it bearable at least…
Ad 5:
First candidate: the security related messages of zwave. It seems they are creating a LOT of overhead. So much that you actually notice it when switching things! Let alone multiple things at the same time. So I made the decision to ditch the security and save a ton of messages going back and forth. If someone wants to drive to my house to wreak havoc on my lights: be my guest. At least this way I can hopefully actually use the network for myself as well!
Ad 6:
Another thing that I noticed on OZW_Log.txt is that many values were ‘refreshed’. Even though they were correctly communicated to the controller. So the dimmer would say ‘I’m switched off now’. OZW would mention the values were ‘not verified’, and right after a message was sent by the controller to get an updated value from the dimmer. Why?? It was not at all clear where this refresh request was coming from. HA? OZW? The Python OZW wrapper? I literally had to spend hours to finally find out: it was the python wrapper! The thing I least suspected! I investigated and tried many things in HA, in OZW, and finally ended up in that python wrapper: https://github.com/OpenZWave/python-openzwave/blob/50bfa05c667449a7c9a5218c1d7ce61f0c193046/src-api/openzwave/command.py#L612. It has some code to work around some kind of ‘issue’ with dimmers not reporting their status correctly. And therefore it is sending TWO additional messages to refresh the value! Sure, only two, but they pile up and are killing my zwave network. And my dimmers don’t even have this ‘issue’. Why is there even this kind of code in a wrapper! Why isn’t this code in OZW itself?? Fix: comment out those few lines in command.py, and now all is well!
Logging example of this issue:
Refreshing node 29: COMMAND_CLASS_SWITCH_MULTILEVEL index = 0 instance = 2 (to confirm a reported change)
Ad 7:
HA has a ‘debug’ setting for the zwave platform that you can set in the configuration.xml file. But it doesn’t lead to messages of level ‘debug’ to end up in OZW_Log.txt. Instead it causes a lot of error and warning messages that were not there before?! And OZW itself has options to enable debug in its ‘option.xml’ file. But HA has two of those. One in the python site packages dir, and one in the HA homedir. I ended up adding ‘SaveLogLevel’ with value ‘8’ (debug) to both, but nothing changed. Oh well. This one I gave up.
Ad 8:
This one I only noticed, but didn’t actually run into any issues with. But why is there device specific workaround code in HA for zwave devices? https://github.com/home-assistant/home-assistant/blob/dev/homeassistant/components/zwave/workaround.py . Can’t (shouldn’t) this be in OZW itself?
So, there you have it. Maybe it is useful to someone at some point. Hope I didn’t offend anybody, like I said before I think all components involved are excellent and I enjoy them very much. I just underestimated the time involved to get them to work reliably together, I guess.