Sometimes an action fail because of a temporal problem (unavailable device, connection problem…) and be useful could retry until the action works.
For example, I try turn off a light but it is unavailable so the action fail. Now I need add a loop until light is off and call light.turn_off with a delay…
And sometimes the reply is just lost or the action on the device is just slow, so do you want you radiator to just keep increasing the temperature, if it is because the reply does not get back to HA.
And how long is a delay to count for slow devices?
What do you do if you start an action on a device that can not be ubdobe again?
Like a pet feeder, where and extra feeding could create issues and especially multiple feedings.
The retry must be optional, yes sometimes could be worst, but others could be inocuos.
For example, a radiator where you set the temperature (not increase, set) don’t be dangerous if you set multiple times or if you send multiples turn off a light.
How many times or what timeout? All must be configurable
On my end i had to implement a “digital twin” + remediation loop.
With Input select(scenes) + input boolean(room), each time theres a change :
An automation tries to set the lights in the correct state it then starts a 1s timer.
When that timer ends the automation mentionned above starts again, but will start a 60s timer afterwards.
The devs need to make this without knowing what device it will be used on, so that is the first hurdle.
Then the timeout needs to be configured, because that is the trigger for the retry.
Without knowing the device the devs can’t set a default value.
That means it is up to the user to know this value in order to be able to use the retry function and very few will be able to figure this out. Trial and error will not wotk for setting this value, because it will have to be set for all situations and not just the ones you can test for, or else it might fail when you need it.
It is still possible to make a retry function with automations and there are also third party integrations, but they still have those requirements in order to actually work when you need it.
Besides that you also need to counter/new actions into consideration.
What happens if you get a new command for the device that is relative in action.
Do you carry on with retry and when it succeed then apply the new command or do you assume the new command should cancel the not succeeded command?
I think this WTH request can be more generalized by a feature that returns the result of an ‘action’ in a variable, similar to how HTTP response codes work. This way the user can himself implement a retry function (e.g. while action_success == FALSE, do action).
Another use case is that you might want to know within an automation if an alarm code was entered wrong. Currently there is no option to act based on an alarmcode that has been entered wrong. A more detailed description can be found here:
I completely support this request. Having a reliable method for retrying operations after a delay is critical. Having the option of then logging that retry and subsequent results would then be the icing on the strawberry pie.
That is pretty much how it works.
You send a request you get your reply by the message that the device state have changed.
Maybe alarmcodes are different and need some working though.
Remember that HA does not make the devices, so GA can not demand a reply and a reply message before erhe state change would just be another point of possible failure, since you then can get an OK reply, but no state change. Might as well stay with just the state change that convey the same thing.
There just needs to be a check for a change of state of a defined entity within a timeout period, if that state change completes then finish the process. Otherwise wait for a further timeout period and apply the request again, wait for the change of state within the timeout period, and repeat. The repeat count, timeouts and the object being monitored for a change are the important inputs.
I wonder if there is any atomisity in the various protocols (zwave, zigbee, etc) which could be leveraged here? For example Test-And-Set. I don’t have protocol knowledge, but with Nabu Casa now represented on the Z-wave forum, maybe they can push to include that feature-set.
A retry function needs needs to be rooted as close to the calling place as possible, because errors can already happen from that point.
If there is RF noise in the moment you transmit your command, then you Z-wave/Zigbee/whatever will never receive it.
Yes, and an atomic command (ATS) can overcome that as you’re expecting a response from the device when the command is received and either failed or completed. You might just need to allow for a suitable timeout on the basis of the protocol to trigger then retry function from that point.
You are trying to invent something that is already there.
You want a response from the device and you already get that in the state change that it reports to HA.
You already have what you need to make a retry function.
The timeout you need is not defined and change from device to device, so that is something each device owner will have to figure out by themself.
Remember that you can easily define what a retry function should be for your device, but the devs need to define it without knowing the device at all.
I have a device here that have 3 switches, but 7 outlets, another device that have 4 switches, but only 2 outlets. How would you take those into account for a retry function, when you do not know what switch affect what outlet/sensor?
You could always use template entities for this. For your light example, you could create a template light and define the turn_off to be a for loop (where the number of loops is the max tries). If the light is off, break out of the loop, it if is on, turn it off and delay for whatever time you select.
Now, instead of calling light.turn_off on your normal light, you call it on your template light and it will handle the retrying.
This version would work even if you turned off an area.
Or you could create a script called “light_turn_off_retry” and pass in the light entity you want to turn off. Instead of calling light.turn_off, you call script.light_turn_off_retry
EDIT: This doesn’t do any device type filtering, but I did just test it and it works to retry any light
I did not try with the template, but with a loop in the automation.
The problem I had was that the device threw an error and the automation stopped without retrying the loop and my actions were not executed. Even with the continue option on error.
What will happen in your template or script in such a case?