Add error/exception handling to Automations

stefre · September 15, 2023, 7:06am

Maybe duplicate, but I did not find this feature request.

Situation: Many sensors, many automations, some … complexity.
Complications: Once in a while a battery powered sensor runs out of juice. And/or an integration might behave slightly different since last updated
Implication: One (or a few) automations execute their actions just partially or not at all. Once(!) noticed: crawl through log files, find the automation, analyse root cause and finally: change battery or fix integration or …

Proposal: Enhance the definition of an automation, so that in the automation can report an error/exception. E.g. through a notification including the name of the automation, the action that failed and ideally the entity involved. Maybe
a) a new section in addition to Triggers, Conditions, Actions called something like On Error
or
b) a new option for each each action type: “on error”
Benefit: much easier to
a) get informed about a malfunctioning automation
b) significantly reduced “time to fix” as we are “close to the root cause”

To be clear: running out of battery juice is just an example. Could be any unexpected device/entity/addon/… state that “breaks” automations without us getting informed/notified.
like

after update to v9.0, automations can’t restart addons with more than one ‘-’ their names.
an integration (like overkiz …) occasionally stops closing some(!) blinds just because some internal queue is full.
a remote shelly device lost its connection to a repeater, but an automation would like to turn on the light.
…

In all these situations a plain notification triggered by the automation with some hint (the action using the entity which makes trouble) would be great.

WallyR · September 15, 2023, 8:31am

The problem is that I think there is no error state as such.

You could have a setup for a test, like wait for sensor to change to X and timeout after this many seconds.
The timeout would then be an event you could react on.

stefre · September 15, 2023, 10:44am

Indeed: as a work around for a few key automations I did define some “test/check automation” that will test if the desired effect of the initial automation has been achieved.
Do that for all automations, and it will add more complexity than it does any good.

I really think automations should be able to report some way or the other if an action failed. They know today and offer their traces and logs. They can even “continue on error”. So why not “notify on error” if requested?

I can see the benefit of a concept around “notify on error” not just in the context of automations… but also scenes, etc

WallyR · September 15, 2023, 11:27am

The continue on error feature is maybe not what you might think.
The error detected is in HA and it’s integrations, but not if the device do not react to a command sent to it from HA.
The error checking stops when the command is sent out from HA.

stefre · September 15, 2023, 1:38pm

Right, point taken. Once it is “out” and the device does not react: fair enough.

Remain the errors HA knows about or get’s informed. Like from an integration dealing with devices and reporting issues to HA during execution. Or something is “not available” although it needs to be the automation. Still worth it to receive some sort of a notification … I think.

WallyR · September 16, 2023, 8:41am

I agree that an action option would be nice.
A react to sensor not changing within a time interval would just be one option I could see.

RalphG · July 23, 2024, 10:00am

An error trap on failure of a device to respond would be great. I had an automation fail overnight as a Zigbee device failed to respond. The device did switch off as requested but the automation failed to continue and therefore did not turn the device back on after the specified delay.