Detect dysfunctional zigbee devices

I have a few zigbee relays that stop functioning after a while.

They are still visible in the network ( “Last Seen” value is current ), but any command I send them, results in:
Failed to call service switch/turn_off. Failed to send request: Request failed after 5 attempts: <Status.MAC_NO_ACK: 233>

Power cycle of the relay in question has fixed it so far. Probably crappy relays, but this only happens sometimes, over several months, so I am not necessarily rushing to replace them all.

Is there an easy way to get a notification of this in HA?

If you can share the specifics of the devices, you might find someone with similar experience.

Unfortunately, the zigbee mesh can add complexity to what might seem simple. For example are your relays connected via another device, aka a router or directly to the coordinator? The zigbee map function might help you see this. If connected via a router, and if the router is having issues, your end device might seem to have issues, however the source problem may be with the router device.

Good hunting!

I use this: Home Assistant Blueprint: Offline detection for Z2M devices with last_seen · GitHub Edit: sorry didnt see that they still report last seen value, then this wouldn’t work.

As I said, I am 99.9999% sure the device itself has issues. It is online, it communicates, it just stops responding to commands. The zigbee side is not the issue, its firmware/relay side is. The problem I am seeking help with is not that it does so. I just want to be notified when that happens so I can power cycle the device.

That there is a pretty high confidence level… :cowboy_hat_face:

I’ve not done it, however you might have a look at the forum thread referenced below and see if you can monitor the message you are seeing. I did a short test of monitoring the ‘system_event_log’ with hass-cli, seemed to confirm what the forum thread says. Not sure where your message is being posted, or if you can monitor a log other than system log. Good hunting!

Just for my curiosity, why are you not willing to share the information about the device you are having trouble with?

Automation based on log

hass-cli event watch | tee a.txt | grep "test"

  10441 event_type: call_service
  10442 data:
  10443     domain: system_log
  10444     service: write
  10445     service_data:
  10446         level: error
  10447         message: test03
  10448 origin: LOCAL
  10449 time_fired: '2023-12-12T13:22:25.850280+00:00'
  10450 context:
  10451     id: 01HHF32TDT194FMK32SH16SG18
  10452     parent_id:
  10453     user_id: 0d7b429570c44c97a5ac1fe785b9d237
  10454 
  10455 event_type: system_log_event
  10456 data:
  10457     name: homeassistant.components.system_log.external
  10458     message:
  10459        -  test03
  10460     level: ERROR
  10461     source:
  10462        -  components/system_log/__init__.py
  10463        -  300
  10464     timestamp: 1702387345.850346
  10465     exception: ''
  10466     count: 1
  10467     first_occurred: 1702387345.850346
  10468 origin: LOCAL
  10469 time_fired: '2023-12-12T13:22:25.850848+00:00'

Thanks, this seems to be the thing I am looking for.
I did not want to share the device, because that will get people off track and they start to recommend ways to mend my zigbee mesh, etc, which I have already determined not to be an issue.