Is refresh_node_value giving any feedback to Home Assistant?

Mastiff · July 28, 2019, 8:32am

I am looking for a low bandwith way of checking if a node in the Z-Wave network is up (mainly to se if the network itself is working, for the rare occasions where the Pi running Home Assistant locks up). And I’m thinking about sending a refresh_node_value with only the node id as the requested value every 30 seconds. The problem is that I have no idea how to get that back to the triggering MQTT broker, and I can’t even get the automation to work:

- id: '1478780092'
  alias: Check if node is alive
  trigger:
  - platform: mqtt
    topic: eg/Node4alive
  action:
  - service: zwave.refresh_node_value
    data_template:
      node_id: 4
      value_id: 'node_id'

The problem is the value_id. I have tried ‘node_id’, {{ node_id }} and {{ “node_id” }}, and still it doesn’t work. I’m totally hopeless on YAML…sorry…

ha_steve · July 28, 2019, 2:14pm

If the pi running Home Assistant locks up, how would this automation run?

Also, I don’t think there is any feedback (that HA gets) for any z-wave commands.

Mastiff · July 28, 2019, 2:48pm

Of course it wouldn’t run if the Pi locks up. That’s just the point. It would be a better version of the passive check I’m doing now, with an automation that sends the info from my Heat It! thermostats to the keep alive system. The problem with that is that the thermostats only seem to send on changed values, so they can be silent for five minutes, and that’s far to long to wait to reset the system if something’s happened. This is how it would work:

The keep alive system would run the automation every 30 seconds (as you can see the trigger’s MQTT), and if something is wrong it will not get the expected answer.
It goes into alarm mode and retries the automation at once. If it doesen’t get an answer then it will first try to restart Home Assistant with the Z-Wave system by calling a special purpose Python script that’s always running and is able to kill the Home Assistant process and start it again.
If that doesn’t work, but the script answers, it will restart the Pi with the script.
If the script doesn’t answer, the keep alive system will assume that the Pi has locked up or lost it’s network connection and sende an MQTT message to the keep alive Pi. That has it’s own Z-Wave network with only one purpose: To use Z-Wave switches to turn off the power to one of the Pi’s in the house and turn it on again. I have several Pi’s with RFXtrx433 and Tellstick Duo around the house because the house is rather difficult, with a 200 year old chimney/firewall more then a meter thick. So the keep alive Z-Wave Pi will turn off the power to the automation Z-Wave Pi and then turn it on again after 5 seconds.
If the Z-Wave doesn’t come online again within three minutes after that (the full startup of my Z-Wave network takes almost 7 minutes, but it can receive and send commands before the startup check is done) it will repeat the procedure once.
If that doesn’t help then a mail is sendt to a special Gmail account that handles just that and fire alarms from the house, and that is a prioritized warning in my Android phone. That will always fire off a gradually increasing fire alarm sounding noise from the Gmail app, so I will get the warning no matter what.

But if getting feedback that way is impossible I will have to use a workaround: Having a special purpose plug-in power switch that I turn on every full minute and off again every minute and 30 seconds. The change from off to on and from on to off will be reported like changes in my heater switches are reported by this automation:

 alias: Ovnsbrytere - Endringsvarsel
  action:
    data_template:
      payload_template: "Room {{trigger.from_state.attributes.friendly_name.split(' ')[2]}}, Z-Wave-node {{trigger.from_state.attributes.node_id}}, went from status {{trigger.from_state.state}} to {{trigger.to_state.state}}."
      topic: eg/ZWaveBryter
    service: mqtt.publish
  condition: []
  id: '1524678991649'
  trigger:
  - entity_id: switch.zwavebryter_ovn_15
    platform: state

The MQTT message from that is like this:

eg/ZWaveBryter u"Room 15, Z-Wave-node 2, went from status on to off."

So that’s my fallback. I just have to put that switch in my technical room, so the constant clicking won’t drive anybody (reading “my wife”), nuts.

firstof9 · July 28, 2019, 5:23pm

Maybe a better idea is to find out why your pi would lock up or loose it’s network connection?

Mastiff · July 28, 2019, 5:53pm

Yeah, that’s a bit more difficult then it sounds… I’m guessing that it has happened 2-3 times to each Pi during the last two years. The Z-Wave Pi has locked up once because of a power issue (as in thunder and a blackout lasting a few hundreds of a second, which was enough to mess up something in it and not affect anything else in the house, but hard reset with the running control system fixed it, and once because of a faulty SD Card, which was when I changed to USB SSD’s.

Of course most home automation people would call that “stuff that happens” and not worry, but I dont accept downtime in the system. Which is why I have redundancies for everything, and the two things that don’t (Z-Wave and main automation VM) must restart within a minute if something happens. This is both because of the WAF factor (I learnt in my previous house, which was sort of the beta test for this that once a week something didn’t work right meant “this stuff NEVER works”) and second because 2/3 this house is rented out, and I don’t want to have any problems for the tenants in my fancy smarthouse.

firstof9 · July 28, 2019, 5:56pm

Toss a UPS backup in there to keep your Pi up and running in a power blip It’ll run for hours on a UPS ment to power a PC.

Mastiff · July 28, 2019, 6:03pm

I have my 5-6 Pi’s in different places around the house, so that would mean one each. Much cheaper to program than to buy all those. I have a UPS on my server, though. With two 110 amp boat batteries it runs for 8 hours in a blackout, with my pfSense box, DSL modem and all other network hardware in the house!

firstof9 · July 28, 2019, 6:27pm

zwave is a low bandwidth protocol, pinging it every 30 seconds to refresh a node is going to cause delays in actual commands being processed. Why not just ping your Pi’s and if they don’t respond do your restart process.

Mastiff · July 28, 2019, 6:44pm

Well, with around 20 devices when the system is ready some time this fall there’s not all that much going on, so I don’t think it will be much of a problem. I have seen a copule of times that Z-wave stopped working when Hass and the Pi wasn’t stuck, and restarting Home Assistant is a bit faster than a reboot. This is the way I have decided to do it, and until I have tested it and found that there are problems with it (which I actaully don’t think there will be) I’m not going to take another approach. I have ben running home automation systems for 20 years plus, and I have always been looking for the foolproof keep alive system. I’m closer than ever before, everything except for Z-Wave is running perfectly, with restarts of the Pi’s and all that, and if I manage to get Z-Wave the way I want, downtime will be so close to none that it will be undetectable. That’s when I will say “this is good enough”.

ha_steve · July 29, 2019, 3:56pm

IMO, if ultra reliability is what you are after, then a Pi is the wrong platform. I know that a lot of people do it and HA was designed to be used on a Pi, but… Pi’s are really not intended to be “server grade” hardware. I would consider a NUC or other small “PC-class” system running Ubuntu.

As for your z-wave issue - I just don’t think there is going to be a way for you to do what you want. It’s fundamentally not intended to work that way. It’s meant to tolerate nodes that are offline, sleeping, etc. for days and weeks without issue. There’s fundamentally not a way to tell if a device is dead, sleeping, or just temporarily offline. So the network doesn’t treat that as anything special.

Also, it shouldn’t “just lock up” (the z-wave network). I can’t tell if your issue is the Pi locking up or the actual z-wave network stopping… the former, as I said, might be addressed with better hardware. THe latter is an issue that you should be able to find root cause and fix.

Mastiff · July 30, 2019, 6:35am

I have thought about the NUC route, but I figured that when something happens it can be fixed within minutes on the Pi, so learning something new to introduce another device in the system will probably take more time than it’s worth.

As I said I have redundancy of anything but Z-Wave (annoying that you can’t have two controllers that actually work as redundant on the network), so that’s the only thing I need to fix redundancy on. I have been considering a rather easy route for full automatic redundancy: A switch that can switch the USB between two Pi’s and having one extra Pi running so that if something happens to the main Z-Wave Pi the second can take over. At the same time the Z-Stick will be soft reset, in case something has happened there. But since I haven’t had any problems except for the SD Card, which is now replaced by an SSD, and the blackout, which happened once, I think that’s overkill even for me, because the only thing it would probably change would be that the network would be a few minutes faster up again.

So if I could find out within 30 seconds if something happens to the Z-Wave-network as this thread actually is about, that would be good enough.