WTH there are not a option to retry action until success?

MiguelAngelLV · November 30, 2024, 9:11pm

Sometimes an action fail because of a temporal problem (unavailable device, connection problem…) and be useful could retry until the action works.

For example, I try turn off a light but it is unavailable so the action fail. Now I need add a loop until light is off and call light.turn_off with a delay…

WallyR · November 30, 2024, 9:22pm

And sometimes the reply is just lost or the action on the device is just slow, so do you want you radiator to just keep increasing the temperature, if it is because the reply does not get back to HA.

And how long is a delay to count for slow devices?

What do you do if you start an action on a device that can not be ubdobe again?
Like a pet feeder, where and extra feeding could create issues and especially multiple feedings.

MiguelAngelLV · November 30, 2024, 9:33pm

The retry must be optional, yes sometimes could be worst, but others could be inocuos.

For example, a radiator where you set the temperature (not increase, set) don’t be dangerous if you set multiple times or if you send multiples turn off a light.

How many times or what timeout? All must be configurable

pixeye33 · November 30, 2024, 9:46pm

On my end i had to implement a “digital twin” + remediation loop.

With Input select(scenes) + input boolean(room), each time theres a change :
An automation tries to set the lights in the correct state it then starts a 1s timer.
When that timer ends the automation mentionned above starts again, but will start a 60s timer afterwards.

bigbeefy · November 30, 2024, 10:00pm

You might want to take a look at this on HACS if you haven’t already: GitHub - amitfin/retry: Home Assistant Integration with Retry Service - Is this what you’re looking for?

MiguelAngelLV · November 30, 2024, 10:04pm

Yes, the concept would be that, but integrated directly into the action itself instead of having to add a wrapper on top.

WallyR · November 30, 2024, 11:18pm

The devs need to make this without knowing what device it will be used on, so that is the first hurdle.
Then the timeout needs to be configured, because that is the trigger for the retry.
Without knowing the device the devs can’t set a default value.
That means it is up to the user to know this value in order to be able to use the retry function and very few will be able to figure this out. Trial and error will not wotk for setting this value, because it will have to be set for all situations and not just the ones you can test for, or else it might fail when you need it.

It is still possible to make a retry function with automations and there are also third party integrations, but they still have those requirements in order to actually work when you need it.

Besides that you also need to counter/new actions into consideration.
What happens if you get a new command for the device that is relative in action.
Do you carry on with retry and when it succeed then apply the new command or do you assume the new command should cancel the not succeeded command?

Getslow · December 2, 2024, 10:02am

I think this WTH request can be more generalized by a feature that returns the result of an ‘action’ in a variable, similar to how HTTP response codes work. This way the user can himself implement a retry function (e.g. while action_success == FALSE, do action).

Another use case is that you might want to know within an automation if an alarm code was entered wrong. Currently there is no option to act based on an alarmcode that has been entered wrong. A more detailed description can be found here:

Add possibility to trigger actions when service calls fail - Feature Requests - Home Assistant Community

RonnieLast · December 2, 2024, 10:23am

I completely support this request. Having a reliable method for retrying operations after a delay is critical. Having the option of then logging that retry and subsequent results would then be the icing on the strawberry pie.

WallyR · December 2, 2024, 3:15pm

That is pretty much how it works.
You send a request you get your reply by the message that the device state have changed.
Maybe alarmcodes are different and need some working though.

Remember that HA does not make the devices, so GA can not demand a reply and a reply message before erhe state change would just be another point of possible failure, since you then can get an OK reply, but no state change. Might as well stay with just the state change that convey the same thing.

RonnieLast · December 2, 2024, 4:55pm

There just needs to be a check for a change of state of a defined entity within a timeout period, if that state change completes then finish the process. Otherwise wait for a further timeout period and apply the request again, wait for the change of state within the timeout period, and repeat. The repeat count, timeouts and the object being monitored for a change are the important inputs.

Markus99 · December 2, 2024, 5:46pm

+1 for this. I had to write a script that I call in automations to handle this. Spawned from Zwave network via HA issues, anyhow, here’s the script:

sequence:
  - variables:
      device_list: "{{ expand(device) | map(attribute='entity_id') | list }}"
  - alias: Try Default Method First
    service_template: |-
      {% if 'group' in device %}
        homeassistant.turn_{{ state }}
      {% else %}
        {{ device.split('.').0 }}.turn_{{ state }}
      {% endif %}
    data_template:
      entity_id: "{{ device }}"
  - delay: "00:00:01"
  - repeat:
      for_each: "{{ device_list }}"
      sequence:
        - variables:
            device_1: "{{ repeat.item }}"
        - repeat:
            while:
              - condition: template
                value_template: "{{ states(device_1) != state }}"
                alias: "WHILE: Device State != Desired State"
              - condition: template
                value_template: "{{ repeat.index <= 3 }}"
                alias: "WHILE: # Attempts <= 3"
            sequence:
              - service_template: "{{ device_1.split('.').0 }}.turn_{{ state }}"
                data_template:
                  entity_id: "{{ device_1 }}"
                alias: "ServiceTemplate: DEVICE_1.turn_STATE"
              - wait_template: "{{ states(device_1) == state }}"
                continue_on_timeout: true
                timeout: "00:00:02"
                alias: "WAIT FOR: Device State = Desired State (2s timeout)"
mode: parallel
max: 20

And how it’s called:

actions:
  - data:
      device: switch.charger_phone_mark_zb
      state: "off"
    action: script.ensure_device_changes

Also had to create similar scripts for lights w/ brightness and colors.

Would love this being built in though, this is super kludgy…

Pink-o · December 4, 2024, 9:57am

I used this workaround to repeat an action until success:

amitfin · December 5, 2024, 7:13pm

From my experience the “new state validation” is more important than “success validation” since actions can get performed in the background so they don’t always propagate errors.
There is an offer to contribute the Retry integration to core, if it helps - Contribute "retry" integration to core · home-assistant/architecture · Discussion #1171 · GitHub

RonnieLast · December 5, 2024, 8:19pm

I wonder if there is any atomisity in the various protocols (zwave, zigbee, etc) which could be leveraged here? For example Test-And-Set. I don’t have protocol knowledge, but with Nabu Casa now represented on the Z-wave forum, maybe they can push to include that feature-set.

WallyR · December 6, 2024, 7:28am

A retry function needs needs to be rooted as close to the calling place as possible, because errors can already happen from that point.
If there is RF noise in the moment you transmit your command, then you Z-wave/Zigbee/whatever will never receive it.

RonnieLast · December 6, 2024, 8:34am

Yes, and an atomic command (ATS) can overcome that as you’re expecting a response from the device when the command is received and either failed or completed. You might just need to allow for a suitable timeout on the basis of the protocol to trigger then retry function from that point.

WallyR · December 6, 2024, 10:38am

You are trying to invent something that is already there.
You want a response from the device and you already get that in the state change that it reports to HA.
You already have what you need to make a retry function.
The timeout you need is not defined and change from device to device, so that is something each device owner will have to figure out by themself.

Remember that you can easily define what a retry function should be for your device, but the devs need to define it without knowing the device at all.
I have a device here that have 3 switches, but 7 outlets, another device that have 4 switches, but only 2 outlets. How would you take those into account for a retry function, when you do not know what switch affect what outlet/sensor?

potelux · December 6, 2024, 7:26pm

You could always use template entities for this. For your light example, you could create a template light and define the turn_off to be a for loop (where the number of loops is the max tries). If the light is off, break out of the loop, it if is on, turn it off and delay for whatever time you select.

Now, instead of calling light.turn_off on your normal light, you call it on your template light and it will handle the retrying.

This version would work even if you turned off an area.

Or you could create a script called “light_turn_off_retry” and pass in the light entity you want to turn off. Instead of calling light.turn_off, you call script.light_turn_off_retry

EDIT: This doesn’t do any device type filtering, but I did just test it and it works to retry any light

sequence:
  - repeat:
      count: "{{ number_of_times_to_repeat + 1 }}"
      sequence:
        - if:
            - condition: not
              conditions:
                - condition: template
                  value_template: "{{ is_state(light_to_retry_turn_off, 'off') }}"
          then:
            - action: light.turn_off
              target:
                entity_id: "{{ light_to_retry_turn_off }}"
            - delay:
                hours: 0
                minutes: 0
                seconds: 0
                milliseconds: "{{ delay_in_milliseconds }}"
fields:
  light:
    selector:
      entity: {}
    name: light
    required: true
  repeat:
    selector:
      number:
        min: 1
        max: 100
    name: repeat
    default: 10
    required: true
  delay:
    selector:
      number:
        min: 1
        max: 10000
    name: delay
    description: Delay in Milliseconds
    default: 100
    required: true
variables:
  light_to_retry_turn_off: "{{ light }}"
  number_of_times_to_repeat: "{{ repeat }}"
  delay_in_milliseconds: "{{ delay }}"
description: ""
icon: mdi:lightbulb-off-outline

Pink-o · December 8, 2024, 9:07am

I did not try with the template, but with a loop in the automation.
The problem I had was that the device threw an error and the automation stopped without retrying the loop and my actions were not executed. Even with the continue option on error.

What will happen in your template or script in such a case?