Hello everyone,
this is all AGPLv3, as the license also says, use all information & code & yaml in this post at your own risk.
I’d like to be notified as soon as possible when a zigbee device that is connected to main goes offline. I’ve no idea how reliable these lights and switches will be in the long term, so if the device isn’t responding because it is defective, I prefer to be notified promptly, just in case.
There’s probably other ways to achieve this. I suppose it would be possible to lower the ZHA timeout of non battery devices from 7200 to <=300 and then listen for unavailable states, but I guess that’s set to 7200 for a reason, or not? It’s not clear how many times it would be guaranteed to try to ping during those 300 sec, in case of temporary packet loss with interference. Having a logic that can be tweaked as needed in case of packet loss, and doesn’t depend on the ZHA timeout internals, to me appeared more robust and future proof.
If somebody can find a way to create a blueprint out of this and to make it more self contained it would be great. Another direction would to make it all python which would also make it much simpler with a simple self contained script, and we could hold the state within python_script using time.sleep too (and it wouldn’t risk to overflow), if only there was a way to listen to the events, but then I read time.sleep would slowdown the core so maybe it’s better not to use time.sleep anyway, dunno.
The end result of all the above constraints is that the procedure to install this automation is very manual with one helper, two automations, two scripts, plus one external component:
-
create an input_text helper, better defined to the maximum size 255 (why so short…). I called it
input_text.zha_toolkit_ping_alarm_helper
-
enable
python_script:
andzha_toolkit:
in configuration.yaml -
install the Uptime sensor integration
-
paste the below in ~/config/python_scripts/zha_toolkit_ping_alarm_send.py
entity_ids = data.get('entity_ids')
max_tries = data.get('max_tries')
helper = data.get('helper')
if not helper or hass.states.get(helper) is None:
logger.warning('Missing helper')
elif max_tries is None:
logger.warning('Missing max tries')
elif max_tries < 1:
logger.warning(f'Wrong max tries {max_tries}')
elif entity_ids is None:
logger.warning('Missing entity_ids')
else:
for entity_id in set(entity_ids):
if hass.states.get(entity_id) is None:
logger.warning(f'Not found entity_id: {entity_id}')
continue
service_data= {
'ieee': entity_id,
'event_done': 'zha_toolkit_ping_alarm',
'args' : [ 1, max_tries ],
}
hass.services.call("zha_toolkit", "ieee_ping", service_data, blocking=False)
- paste the below in ~/config/python_scripts/zha_toolkit_ping_alarm_recv.py
ieee_org = data.get('ieee_org')
success = data.get('success')
tries = data.get('tries')
max_tries = data.get('max_tries')
helper = data.get('helper')
if not helper or hass.states.get(helper) is None:
logger.warning('Missing helper')
elif max_tries is None:
logger.warning('Missing max tries')
elif max_tries < 1:
logger.warning(f'Wrong max tries {max_tries}')
elif tries is None:
logger.warning('Missing tries')
elif tries < 1 or tries > max_tries:
logger.warning(f'Wrong tries {tries}')
elif ieee_org is None or hass.states.get(ieee_org) is None:
logger.warning('Missing ieee_org')
elif success is None or success not in (True, False):
logger.warning('Missing success')
else:
helper_state = hass.states.get(helper)
assert(helper_state.attributes['editable'] == True)
assert(helper_state.attributes['min'] == 0)
assert(helper_state.attributes['pattern'] is None)
assert(helper_state.attributes['mode'] == 'text')
len_max = helper_state.attributes['max']
last_offline = set(helper_state.state.split())
helper_update = False
for offline in last_offline.copy():
if hass.states.get(offline) is None:
logger.warning(f'Discarding {offline} from helper')
last_offline.discard(offline)
helper_update = True
if success:
if ieee_org in last_offline:
last_offline.remove(ieee_org)
helper_update = True
logger.warning(f'Online {ieee_org}')
else:
if tries >= max_tries:
friendly_name = hass.states.get(ieee_org).attributes['friendly_name']
service_data= {
'title': 'Ping Alarm',
'message': f'Offline: {friendly_name}',
}
hass.services.call("notify", "persistent_notification", service_data, blocking=False)
logger.warning(f'Offline {ieee_org} {friendly_name}')
last_offline.add(ieee_org)
helper_update = True
else:
if ieee_org not in last_offline:
service_data= {
'ieee': ieee_org,
'event_done': 'zha_toolkit_ping_alarm',
'args' : [ tries + 1, max_tries ],
}
hass.services.call("zha_toolkit", "ieee_ping", service_data, blocking=False)
if helper_update:
new_offline = ' '.join(last_offline)
if len(new_offline) > len_max:
logger.warning('too long offline string, truncating')
new_offline = new_offline[:len_max]
hass.states.set(helper, new_offline, helper_state.attributes)
- add the below automation to ~/config/automations.yaml to define the interval of the ping (default 5 min) and replace light.abc and light.def with the list of entity_ids of the devices you need to monitor. You can also easily tweak the “max_tries” parameter if you prefer more or less tolerance for packet loss.
- alias: zha toolkit ping alarm send
description: ''
trigger:
- platform: time_pattern
minutes: /5
condition:
- condition: template
value_template: '{{ as_timestamp(now()) - as_timestamp(states.sensor.uptime.last_changed)
| int > 600 }}'
action:
- service: python_script.zha_toolkit_ping_alarm_send
data:
entity_ids:
- light.abc <- edit this and add more or less entries as needed
- light.def <- edit this and add more or less entries as needed
helper: input_text.zha_toolkit_ping_alarm_helper
tries: 0
max_tries: 10
mode: single
- add the below automation to ~/config/automations.yaml to listen to the trigger for the pong or timeout, the retires timeout is also easy to tweak, 5 sec by default. With more than 10 devices to ping you should increase the max parallelism.
- alias: zha toolkit ping alarm recv
trigger:
- platform: event
event_type: zha_toolkit_ping_alarm
event_data:
command: ieee_ping
params:
event_done: zha_toolkit_ping_alarm
condition: []
action:
- if:
- condition: template
value_template: '{{trigger.event.data.success == false}}'
then:
- delay:
hours: 0
minutes: 0
seconds: 5
milliseconds: 0
- service: python_script.zha_toolkit_ping_alarm_recv
data:
ieee_org: '{{trigger.event.data.ieee_org}}'
success: '{{trigger.event.data.success}}'
tries: '{{trigger.event.data.params.args[0]}}'
max_tries: '{{trigger.event.data.params.args[1]}}'
helper: input_text.zha_toolkit_ping_alarm_helper
mode: parallel
max: 10
-
you may want another automation that forwards call_service.persistent_notification with the given title to notify.notify or you can directly edit the python script to invoke any other notification service of your choice
-
you may want to decrease the amount of recording related to these events by tweaking the
recorder:
setting in configuration.yaml:
recorder:
exclude:
entities:
- automation.zha_toolkit_ping_alarm_send
- automation.zha_toolkit_ping_alarm_recv
event_types:
- zha_toolkit_ping_alarm
- with too many devices (more than 255 char worth of entity_ids) going offline at once, the helper will overflow, which supposedly will only cause Offline notification dups, but the overflow code path is untested