2023.8: Update breaks automations

I have an excellent Zwave mesh network with some 80 Zwave devices. I doubt that range has anything to do with my issue. This being said, I am no longer sure that it is a 2023.8 issue as I rolled back to 2023.7 and still have the same problem. The only way to resolve when it happens is to reboot my Yellow Box. Next time it happens I will post the log to see if anyone can identify the cause.

not range, time to respond.

Thanks. Is there any information that I can post that would help identifying the cause ?

Likewise - I have around the same number of Z-Wave devices, a fantastic mesh, no range issues at all in fact I even have one in my chicken coop in backyard and in my letterbox at front no issues at all. However this particular Z-Wave device is through three walls so range not an issue but signal strength is, thus there can be short delays this this particular one. I should improve the mesh over in that area but haven’t got around to it.

I am watching this thread in earnest as well. My automations have been getting stuck for the last few weeks, and a reboot fixes it for a short while. I rolled back through all of the July releases and tried out the latest August release - none of them helped. I’m now on:

Home Assistant 2023.6.0
Supervisor 2023.07.1
Operating System 10.4
Frontend 20230607.0 - latest

And my automations are now back to running normally so far today.

I would see the “already running” on automations and some things like MQTT integrations and my Bond integration. Seem like some sort of core async functionality is gumming things up, and automations are the most visible issue.

Indeed, switching my automation mode from “Single” to “Restart” helps short term - but again - everything was working on my system for the last few years. My environment is pretty static but probably on the large size - 54 Z-wave devices and 51 Zigbee devices.

1 Like

I use the custom auto-entities card to see running automations:

      - type: custom:auto-entities
        card:
          show_name: true
          show_icon: false
          show_state: false
          type: glance
          title: Lopende automatiseringen
          columns: 1
        filter:
          include:
            - domain: automation
          exclude:
            - attributes:
                current: 0
        sort:
          method: last_triggered
        show_empty: true
2 Likes

Interesting… I am not all that technical. How and where do I set up this auto-entities card ? In Overview, in Yaml ?

Is it possible for you to determine if the failing automations exclusively involve entities based on the Zwave integration? Or they fail regardless if the entities are based on Zwave or Zigbee?

1 Like

For the automations that fail, like the simple one you posted involving turning on two switch entities, list the integrations used by the entities.


EDIT

Earlier you mentioned you had a solid Zwave network with 80 devices. Are all of your failing automations communicating exclusively with Zwave devices? Do you have any automations that don’t fail and do they communicate with Zwave devices or something else?

The goal here is to determine if the problem is limited to automations involving the Zwave integration or if it occurs for other integrations as well (and which ones).

Instructions are here, It seems to have support for gui editing in dashboards, however that seemed broken to me, but you can use yaml mode and paste the above example when you have installed the custum card.

Thanks… I try to stay away from Yaml and I don’t find the auto-entities card as an option. But no worries, this is not a priority for me. Right now, I want to know why now almost every day (or even sometimes several times a day) I need to restart HA to get Automations run properly. In fact I just created an Automation to restart every day at 4 am.

You can go to Developer Tools > States, enter current: 1 in the Attributes column to list all automations that are currently executing.

Here’s an example showing I currently have no running automations.

FWIW, if you change current: 1 to current: 2 it will list all automations that are currently running and have a second instance waiting in the queue.

1 Like

In my above example I included all automations but excluded the ones with current 0 - because when you include only the ones with current 1 you don’t get those with count above 1. I wouldn’t be surprised if the “hanging” ones racked up a high current value.

You can paste this into the template editor and it’ll show you what’s running

Open your Home Assistant instance and show your template developer tools.

running:
{%- for a in states.automation | selectattr('attributes.current', 'defined') | selectattr('attributes.current', '>', 0) | sort(attribute='attributes.current', reverse=True) %}
  {{ a.entity_id }}: {{ a.attributes.current }}
{%- endfor %}
1 Like

I can confirm that most of my automations with mode: restart hangs.

By using that template you provided, I can see that:

automation.some_automation: 3

Not sure if that means 3 of them are running. Which anyhow should not happen as it is mode: restart

The reason for making the automation restart is that it acts as a “debounce”. The trigger might fire 1-3 times depending on circumstances that is hard to know. My automation basically sleeps for 2 seconds, then does what it is supposed to do ONCE even if it was triggered say 3x within a second.

Now it only hangs and nothing in the logs.

The trace says “still running” but there is nothing that waits for 5+ minutes in the automaton. It is doing something local that takes 1s to complete (and 2 seconds waiting before actually doing)


I mean, it has not even “left” the trigger step in the trace.

Can you share this automation (the yaml) in this issue please

Also, make sure you include information on the hardware/integration being used in the automation

That’s a good description of the problem you’re experiencing. Yes, the 3 means that there are three instances of the automation, one is executing and the other two are queued for execution. Like you said, that seems unusual given that the automation’s mode is restart (i.e. it’s not queued or parallel).

Regarding your statement:

most of my automations with mode: restart hangs

For the ones that hang, what is the integration (or integrations) of the entities that the failed automations are communicating with? There have been reports that this problem is occurring for entities based on the Zwave integration. However, it’s unclear if that’s the only integration that has exposed the problem or there are others.

If you want it to prevent multiple consecutive operations, restart is not the most logical choice. Why not set it to single, react immediately, wait a few seconds to block others? Because with restart HA stops the running automation to start a new one. But if the action is already in the works and taking some time then multiple executions might pile up anyway. Could it be the actual workload takes a long time, and the following triggers are waiting for the workload to finish?

And if it is because multiple movement should delay the light turning off: i.m.o. you should not schedule light off based on motion starting, but on motion ending. Prolonged motion detected should also not turn off the light. And instead of waiting inside an automation you could consider using a timer. That survives HA reboots, and running automations do not. Renewed motion could then reset the timer. I have almost no automations that use a delay - they all finish in a very short time.

1 Like

If it is behaving that way then it isn’t complying with restart mode’s documented behavior.

From Script Modes:

Start a new run after first stopping previous run.

image

But what if e.g. turning off a device takes a long time due to timeouts communicating? It does not promise to abort singular actions, it will probably wait for that to end and then stop the automation. I would expect it to stop a delay, but not to halfway abort a light_on service call.