WTH doesn't HA retry a failed service call?

So due to unreliabilities, it is always possible that HA calls a service but the call does not succeed, for a variety of reasons.

In many cases HomeAssistant is able to measure whether the state actually changes. So why doesn’t HomeAssistant spot this failure and do something about it, like retry, flag a notification and send analytics to the code owner?

There could even by a ‘retry on failure’ option in the call service UI.

I aggree, in special when a entity is temporaly unavailable (conection lost for example), so, a adional parameter retry that retry call service with entity are available can be very useful.

I could have come thru… and we can’t know the impact of calling things twice.

Sure, for a light… but some random entity on a car that gets controlled… you could be toggling locks for example.

One of the things I’d always wanted to add at this point, is allowing for running sequences of actions on failure, that way, you can decide.

…/Frenck

6 Likes

My perception on integrations inside home assistant vs related python packages implementing business logic;

  • HA is like a remote control (or an interface opening to respective python packages), enabling presentation layer to the user
  • python packages are the actual owner of all business logic about their external worlds

Agreeing with @frenck about risks of device level complexities , we should let respective API packages to decide if retry is needed or not. Moreover, if a service call has failed, the internal state of HA should not change and we can be informed about it (within a reasonable time frame)

1 Like

Yeah probably a number of ways to do this.

I agree toggling would be a bad one to retry. But service calls where the desired state afterwards is absolute e.g turn_off, lock, unlock, cover open. Then I don’t see the harm in automatically retrying. I agree it doesn’t make sense for toggle, dim, and other ‘relative state’ calls where if the error is in the state not updating on HA afterwards then undesirable behaviour/loops could occur.

Or, as you says, could just be an on_failure: with a a built in retry or custom_action and let the user decide at their own peril…

But stated in its simplest form. HomeAssistant tries to change a state and then knows/thinks this failed (because the state does not update). This should not be a silent event. It should (at the very least) be logged.

1 Like

Yup, ideally, an integration handles this as well. It will know best.

For example, the WLED integration I wrote, will retry by itself before raising a failure.

2 Likes

True, maybe something to add to silver or gold:

‘checks whether service calls succeed and retries where applicable and safe to do so’

1 Like

That is a library implementation, which is considered out of scope for HA. Such retries do not take place in HA, but in a level UP.

Understood. In which case my original WTH still stands. HA knows that one of its integrations/libraries has failed to achieve something it tried to do, this shouldn’t be a silent event, as it is likely to have frustrated the user - or should is say “WHAT THE HECK!?”.

2 Likes

Adding one analogy;

  • you use tv remote and click power button to turn on or off the tv
  • remote is somehow not pointing on tv so tv did not receive the command and nothing happened
  • it is up to user to retry based on output/response

There might be some service calls, which is not updating the internal HA state, so it is very tricky from HA perspective to decide. Maybe, we can just make an automation action parameter or service call parameter, as retry condition (in which condition to retry), retry timeout and retry count.

While I understand your analogy in the context of Home Assistant vs libraries, I don’t want that to be a reason to not implement a change in one or the other. As a hobbyist, it sounds like an interesting challenge - does HA enforce a new standard on devs?

As a user, I agree with HarvsG. The ideal state for many users is to never have to use a “remote” again, instead automating their lives through triggers. When I walk into a room, I expect the light to turn on without my intervention (handled via motion sensor, presence detection, or other trigger). I expect things to work and don’t care if it took 1 or 100 API calls to get it done, that’s all behind the scenes. If I have to “use” anything, then I might as well have used my light switch and throw Home Assistant out the window.

Point is, as a user, it doesn’t matter to me if it’s not technically Home Assistant’s responsibility or the library dev. Home Assistant gets the axe and my Nabu Casa subscription gets cancelled.

1 Like

Nobody said that… :thinking:

Agree 100%, nobody ever said otherwise. Not sure what it triggered, nobody said anything about this.

Sorry to hear that, nevertheless good luck on your future journeys! :heart:

…/Frenck

3 Likes

I don’t think SteveHome meant this in an accusatory tone I think he was just trying to highlight that there is a difference in a dev vs a user perception of (this/any) problem and playing the devil’s advocate in doing so! Probably in response to

“That is a library implementation, which is considered out of scope for HA. Such retries do not take place in HA, but in a level UP.”

I don’t think it was an actual threat to cancel his subscription.

Oh the joys of written forms of communication on the internet… /s

1 Like

:man_shrugging: It doesn’t add value in discussions. In the end, I’m personally fine with that. One should use something one like. If something else fits better, that is completely fine and up to them.

In the end, everybody tries their best to find common grounds and solutions, trying to make HA better every day… together.

4 Likes

Great point on discussing technical matters, I am not part of nabu casa or home assistant dev team but this statement has zero contribution, other than sound threatening

Yeah HarvsG you got the tone, sorry Frenck! I thought Frenck saying “out of scope for HA” meant “we can’t/aren’t going to do anything about it” - which wasn’t the case (?)

The feeling I was trying to express in response to this was that others might quit HA if nothing is done, so HA needs to do something. That’s all. Not going anywhere, you guys are awesome! :slight_smile:

Well my bad, I really wasn’t trying to threaten…I was just trying to encourage the HA team to deal with this issue rather than declare out of scope and leave it to the integration devs that haven’t implemented it yet (or this wouldn’t be a WTH…)

The cancellation statement was simply to say that casual users don’t/shouldn’t have to differentiate between HA and integration devs’ responsibilities. They pay HA. So if devs don’t make things work and people give up, HA doesn’t get paid. Therefore it’s HA’s problem.

Not a threat. Not implying HA is greedy or devs are lazy. None of that. Simply trying to say that it should be within scope of HA or HA requirements to devs if devs haven’t implemented this yet. That’s all.

1 Like

I’m working on this now because stuff not turning on and hacking automations with retry loops is time consuming. Unfortunately not all integrations retry and many don’t even return errors if the destination device is offline…or the command gets lost due to RF (zwave :-()

Anyways, I have a very early implementation that I’ve been using, it’s based on using shell scripts to autogenerate the retry stubs for entity service calls as scripts. Once we get the nuances mastered we’ll create a python integration. But for now, scripts are quick and easy to work with. RIght now it has a very limited set of methods; but extensible.

Looking for power users who can work on this with me and/or do some evaluation to see how it works in your world. Right now you’ll need some shell scripting capability to use it.

It auto-generates a lovelace dashboard so you can see the metrics:

And look at overall execution times:

image

2 Likes

We created a service which wraps another service with retry mechanism - GitHub - amitfin/retry: Home Assistant Integration with Retry Service
It can be installed via HACS by using this link.
While developing it, the most surprising thing was the fact that when an entity is marked unavailable (e.g. temporary connectivity issues) it’s being silently skipped by HA (in this code.) Therefore, the retry mechanism verifies the availability of the entities explicitly (and not only errors).

5 Likes

Any idea if this is on the horizon?