2023.8: Update breaks automations

I know you were not asking me. However, the problem manifested itself in the automation getting stuck, i.e. it could not be neither terminated nor restarted nor even disabled. For me, only a complete restart would reset it.

BTW, sorry for missing the link in that Github issue. My bad.

It remains to be seen whether the problem with the posted simple automation is related to what you’re experiencing.

it’s most likely the change that was made for service calls that return data. Changes were made to blocking/non-blocking service calls. There were some policies that were removed for timeouts on service calls (when the action doesn’t complete within 10 seconds) and they probably need to be reinvestigated. There is definitely a problem here.

Outside this issue, if your device has a robust connection to HA, you shouldn’t see this issue at all because the services won’t fail to execute within the timeout.

This is the current speculation on the problem at hand. I’m not sure if it’s the actual issue.

1 Like

Perhaps the timeout aspect that you theorized manifests itself more frequently for certain integrations. For example, there’s a better-documented Issue about failing automations here and it involves the Zwave integration.

The following Issue was reported for 2023.7.3, when service calls allowing for returned data was first implemented, that also involves Zwave.

Not sure if it’s a clue, or a red herring, but thought it was worth mentioning.

I didn’t theorize this, I have been discussing this over on discord with people who made the service call changes. There’s a very good chance that this is the problem.

We’ve been seeing increasing number of questions around automations that never stop running and they started in 2023.7. These questions keep cropping up on all HA media platforms.

Ah! I wasn’t aware of that discussion; seems like a plausible theory. Not due to a side-effect of a recent “performance optimization” that I had proposed above but, nevertheless, a side-effect of a recent new feature.

Well, that’s what the experts (of the code) think. It still remains to be seen. We need to replicate the behavior in order to come up with a solution. We also need as much information inside the issue instead of on the forums.

Yes, I had mentioned that to EndUser about their sparsely populated Issue.

The problem must be documented in the Issue, not elsewhere (such as this topic).

1 Like

This makes 100000% perfect sense to me. When I look at the 15 identical copies of same automation that only ONE of is suddenly having this problem, it is one that is triggered by a sensor that is not in ideal range. eg sometimes it can take a few seconds to work because it is not as good of a signal as all my other Z-Wave devices.

Again though, while the others in this thread are reporting this is 2023.8.0, I haven’t yet upgraded to 2023.8.0 - I’m still on 2023.7.3 and the problem began with that version. Which I believe also made perfect sense to you too.

So seems you have hit the nail right on the head here - it’s when there isn’t a “perfect” Z-Wave connection then the service fails to execute within the timeout and the service change thing you mention then causes the automation to be left in a hung state.

In the case of others here, wonky stuff happens with the mode: restart. Again I’m different - all of mine are set to the default mode: single. Same problem though - because the automation is left hanging it can never run again… until I reboot HA.

That’s what I hypothesized here, namely that the issue “manifests itself more frequently for certain integrations” notably Zwave.

It would be helpful if all users who reported the problem in this topic would indicate which integration (or integrations) are referenced by their failing automations.

1 Like

I have an excellent Zwave mesh network with some 80 Zwave devices. I doubt that range has anything to do with my issue. This being said, I am no longer sure that it is a 2023.8 issue as I rolled back to 2023.7 and still have the same problem. The only way to resolve when it happens is to reboot my Yellow Box. Next time it happens I will post the log to see if anyone can identify the cause.

not range, time to respond.

Thanks. Is there any information that I can post that would help identifying the cause ?

Likewise - I have around the same number of Z-Wave devices, a fantastic mesh, no range issues at all in fact I even have one in my chicken coop in backyard and in my letterbox at front no issues at all. However this particular Z-Wave device is through three walls so range not an issue but signal strength is, thus there can be short delays this this particular one. I should improve the mesh over in that area but haven’t got around to it.

I am watching this thread in earnest as well. My automations have been getting stuck for the last few weeks, and a reboot fixes it for a short while. I rolled back through all of the July releases and tried out the latest August release - none of them helped. I’m now on:

Home Assistant 2023.6.0
Supervisor 2023.07.1
Operating System 10.4
Frontend 20230607.0 - latest

And my automations are now back to running normally so far today.

I would see the “already running” on automations and some things like MQTT integrations and my Bond integration. Seem like some sort of core async functionality is gumming things up, and automations are the most visible issue.

Indeed, switching my automation mode from “Single” to “Restart” helps short term - but again - everything was working on my system for the last few years. My environment is pretty static but probably on the large size - 54 Z-wave devices and 51 Zigbee devices.

1 Like

I use the custom auto-entities card to see running automations:

      - type: custom:auto-entities
        card:
          show_name: true
          show_icon: false
          show_state: false
          type: glance
          title: Lopende automatiseringen
          columns: 1
        filter:
          include:
            - domain: automation
          exclude:
            - attributes:
                current: 0
        sort:
          method: last_triggered
        show_empty: true
2 Likes

Interesting… I am not all that technical. How and where do I set up this auto-entities card ? In Overview, in Yaml ?

Is it possible for you to determine if the failing automations exclusively involve entities based on the Zwave integration? Or they fail regardless if the entities are based on Zwave or Zigbee?

1 Like

For the automations that fail, like the simple one you posted involving turning on two switch entities, list the integrations used by the entities.


EDIT

Earlier you mentioned you had a solid Zwave network with 80 devices. Are all of your failing automations communicating exclusively with Zwave devices? Do you have any automations that don’t fail and do they communicate with Zwave devices or something else?

The goal here is to determine if the problem is limited to automations involving the Zwave integration or if it occurs for other integrations as well (and which ones).