Does HA know if a ZWave command wasn't received by a device?

I have some problematic Zwave devices in my garage. Some devices in the garage always work, but some of them often don’t respond to their instructions. It’s intermittent. Sometimes issuing the command a second time will work, other times not. Then the next day it might be back to working as expected.

I’ve tried healing the nodes, but it doesn’t seem to make a difference.

Say that HA sends on “on” command to a zwave device, and the device doesn’t turn on or doesn’t acknowledge. Does HA attempt to resend the command? Is there a way to make it do that?

No

Also, no

Usually if a device doesn’t respond to a command it get’s flagged as “Dead” and should become “unavailable”. You could trigger off the entity becoming unavailable and issue a ping.

That’s not strictly true. A device needs to fail to respond multiple times before it is marked as dead.

A mesh network needs a strong backbone, the more powered nodes that there are, the better the mesh will be. The intermittent part is happening because the interference preventing the signal making it to the end device is also intermittent.

On the Home Assistant side, turning the device on for example, Home Assistant will indicate that the device is on initially in the UI, and then the switch will slide back to off when it has failed to actually turn the device on.

Last I checked the driver will try 3 times to send a command, if it doesn’t get an ACK it’s flagged ‘dead’.

Yeah but that’s 3 SEPARATE times. Because on the hardware side, when you send a command and it fails to reach the end device, it is sent again, and again. A maximum of 4 times. This allows the controller to try different routes - which is the whole point of a mesh network. But the time between each try can be as long as 5 - 8 seconds, because each attempt can use a maximum of 4 hops.

The software drivers mark a device as dead quicker than the actual USB dongle does, because once the USB dongle has marked a device as dead, it stays dead (unless it receives a message from the dead device) until the dongle is issued the soft restart command. In this situation, the controller will not even attempt to speak to a device it has marked as dead, it will not ping it or heal it.

If I command a switch “Off” the zwavejs sends the command to the switch, if the switch doesn’t ACK, zwavejs tries 2 more times, then flags the device dead if no ACK comes in.

That’s 1 actual command from HA, with 3 attempts to command it. Once zwavejs tags it dead, it should become unavailable in HA.

How long a delay is there between the attempts?

500ms then 1000ms last time I had issues and was monitoring my logs.

That’s an interesting change from OpenZWave then. The driver doesn’t have a whole lot of control over what the hardware does. The driver tells the hardware to send the command to the device, but has no control over how the hardware behaves at that point. The hardware then responds back to the driver. What should happen, is that the driver should tell the hardware to send the command to the device, the hardware will then try up to 4 times, and only then will it tell the driver that the device did not respond. At this point the driver should not try again, for a number of reasons - but the biggest reason being that commands on the ZWave network queue up, because while the controller is sending or receiving then everything else on the network has to wait until it is clear to talk. Because each of the controllers 4 attempts could utilise 4 hops to reach the end device, then in the very worst case scenarios (of which I have one that I regularly encounter) - it could take 30-40 seconds for the controller to try 4 attempts to reach the end device. Everything else on the ZWave network has to stop while that is happening.

If ZWaveJS then decides to try again immediately in the event that the device does not respond, and it tries 3 times. Then you are talking about all other traffic on the ZWave network stopping for almost 5 minutes. That seems like a terrible design decision.

Further, frequently there is nothing wrong with the ZWave network, and it’s just some sporadic interference. It would be pointless for the ZWaveJS driver to retry immediately, because if it failed the first time (remember that that each attempt is really up to 4 attempts), it is almost certainly going to fail the second time too.

That’s not how it works, nor is it how it worked in OZW to my knowledge. It’s also not synchronous. Nothing stops. The stick responds to messages in the meantime and even during. The retries are scheduled, it’s all async. While it can’t literally send while receiving the controller never waits for an ack before receiving the next packet. Nor does it wait for an ack before sending the next in the queue.

Keep in mind that ozw was reverse engineered. Zwavejs has been written to the spec, which was released in the meantime, and I’m confident that AlCalzone has followed the spec and/or designed this in consideration to how the protocol works.

I encourage you to use a zniffer to watch was is actually going on.

1 Like

Frequently this is a device that’s not acting properly as a repeater. Try air gapping the nearby devices one by one and healing each time (the network or at least all nearby devices and the device at issue itself), then see if it’s better with one out of the picture.

I’ve taken the approach in automations of sending command, verifying the command and if needed retrying the command. I do this for zwave devices, NEST thermostat, because periodically commands do get dropped - much more common with the NEST than zwave. Here’s an example automation I use to set the temperature on my zwave thermostats (and it works for the NEST also), I have similar ones for turning switches on / off. Home Assistant does not have the built in reliability to retry, so you must do it in the application layer. And sure it everything worked perfectly it would be unnecessary, but the world is imperfect and I’ve given up believing perfection is even possible though I continue to strive for it!


- id: set bedroom temperature
  alias: set bedroom temperature
  description: ""
  mode: queued
  max: 2
  max_exceeded: silent
  trigger:
    - platform: state
      entity_id: sensor.bedroom_target_temperature
    - platform: state
      entity_id: input_boolean.hvac_bedroom_control
      to: "on"
  condition:
    condition: and
    conditions:
      - condition: state
        entity_id: input_boolean.hvac_bedroom_control
        state: "on"
      - condition: template
        value_template: '{{ states("sensor.bedroom_target_temperature")|int > 54 }}'
      - condition: template
        value_template: '{{ states("sensor.bedroom_target_temperature")|int < 72 }}'
  variables:
    target_temperature: '{{ states("sensor.bedroom_target_temperature")|int }}'
  action:
    - service: system_log.write
      data:
        level: info
        message: "Setting bedroom_thermostat temperature to {{ target_temperature }} "
        logger: hvac
    - service: script.climate_set_temperature
      data:
        entity_id: climate.bedroom_thermostat
        temperature: "{{ target_temperature }}"
    - delay: "00:00:10"
    - condition: template
      value_template: '{{ target_temperature != state_attr("climate.bedroom_thermostat","temperature") | int }}'
    - service: system_log.write
      data:
        level: warning
        message: "Unable to set bedroom_thermostat temperature to {{ target_temperature }} - retrying in 10 seconds"
        logger: hvac
    - delay: "00:00:10"
    - service: script.climate_set_temperature
      data:
        entity_id: climate.bedroom_thermostat
        temperature: "{{ target_temperature }}"
    - delay: "00:00:10"
    - condition: template
      value_template: '{{ target_temperature != state_attr("climate.bedroom_thermostat","temperature") | int }}'
    - service: system_log.write
      data:
        level: error
        message: "Failed to set bedroom_thermostat temperature to {{ target_temperature }} "
        logger: hvac

Yes I’ve had this happen where a device “locked up” and everything was a mess until I reset it (powered off then on at breaker).

I also had issues with devices on my garage not working. They are at the edge of the network range so have issues sometime. Initially, I had “double check” automations that would send me a notification if the status wasn’t right after the first automation (ie turn bulb off, wait one minute, and check if the bulb is off, if still on or unavailable notify me of the problem or try again).

I added an external outlet on my house though which acts as a repeater since then to the detached garage, and things have been much more reliable. This is the one I bought

One other thing I would check is look at your logs for “chatty” nodes. One of my switches in the garage was sending constant power management reports. Zwave is very low bandwidth and this plus how far away the device is was a recipee for disaster. I didn’t need power management reports for an led bulb, so went into the config parameters and disabled them. This can be on the device in concern or any in between the commands are trying to go through.

Is this a detached garage? Best approach may be to run a separate zwave network in the garage communicating to the house with wifi. Or use wifi instead of z-wave devices.

If you have some spare zwave devices you could try adding those in the garage to see if performance improves. It’s not easy to tell is more repeaters will help, but more devices in an area is better up to a point.

Think about goals, costs, and future projects in the garage.

It’s an attached garage, but it’s on a lower level, under everything else. I have 6 zwave devices down there, all of which are powered nodes. But some of them are cheapo plugs, so maybe they just aren’t made well?

For whatever reason it seems to have been stable ever since I posted this. I’ll keep an eye on it.

Zwave light switches everywhere.
insert buzz and woody meme here

1 Like