"Already running" New automation bug?

petro · August 2, 2023, 2:30pm

because light.turn_unavailable or light.turn_unknown are not valid services and will cause the automation to deactivate if your binary sensor decided to disconnect.

Secondly, that’s using the state machine which may not be in sync with your actual trigger, which is why mine is using the trigger object over states(..)

smugleafdev · August 2, 2023, 2:31pm

Good point. Does the “from” add anything?

petro · August 2, 2023, 2:32pm

yes, they ensure your states will only trigger when you move from valid states like on and off to on and off. i.e. unknown or unavailable will not impact the automation.

jumon · August 12, 2023, 2:45pm

I’ve been having the same issue here, many automations stop working and when I try to manually run, the logs show it is already running but it should not be. Its like the automations are hanging. Not all automations are affected, but many are and they work fine the rest of the time. I cant even restart HA, that wont work, I have to reboot the RPI its running on (RPI4 8GB). I just updated from 2023.7.3 to 2023.8.2 and will see if this happens again. Unfortunately, it takes several days before it locks up like this and always seems to start in the middle of the night.

Wiigian · August 14, 2023, 6:52am

I am also seeing this issue in 2023.7.3.
Several automations seem to be hanging, and I am seeing warnings with "Already running"in the logs.
Updating to 2023.8.2 now.

Alain_Raymond · August 14, 2023, 11:31pm

I’m also having this problem, there are also a few issues on github about this.
I found that the automations that are problematic on my side (and that were working perfectly fine for months before) are the one using z-wave devices.

I can’t disable and re-enable the automation to reset it, it won’t disable at all.
Instead of rebooting HA, I tried rebooting the Z-wave JS UI addon. When rebooting the addon, the automation will finally fails and leave the ‘‘already running / hanging’’ state.

I’m pretty sure this issue is related to some change introduced in 2023.7 or some change in Z-wave JS UI that were introduced at the same time 2023.7 was released.

Can you confirm if your hanging automations are also using z-wave devices and that you are also using Z-wave JS UI?

github.com/home-assistant/core

Automations marked as "Still Running" After Upgrade to 2023.7 & 2023.8

opened 09:08PM - 08 Aug 23 UTC

lux4rd0

integration: automation

### The problem Automations are getting stuck in versions of HA core 2023.7.0 t…hrough 2023.8.1 after upgrading from 2023.6.3. ![trace1](https://github.com/home-assistant/core/assets/30187533/37c63524-2de6-41eb-8028-d17122b54fe8) ![trace2](https://github.com/home-assistant/core/assets/30187533/d9909caa-7ded-4a35-9219-1c977ea7a7a5) ![trace3](https://github.com/home-assistant/core/assets/30187533/a03ad64e-479a-4d7a-9312-9ab482bff911) ### What version of Home Assistant Core has the issue? core-2023 7.0, core-2023 7.1. core-2023 7.2, core-2023 7.3, core-2023 8.0, core-2023 8.1 ### What was the last working version of Home Assistant Core? core-2023 6.3 ### What type of installation are you running? Home Assistant OS ### Integration causing the issue Z-wave and Zigbee ### Link to integration documentation on our website _No response_ ### Diagnostics information [home-assistant.log.2023.6.3.zip](https://github.com/home-assistant/core/files/12295856/home-assistant.log.2023.6.3.zip) [home-assistant.log.2023.7.0.zip](https://github.com/home-assistant/core/files/12295857/home-assistant.log.2023.7.0.zip) ### Example YAML snippet ```yaml alias: Office Day On description: "" trigger: - platform: state entity_id: binary_sensor.office_motion_motion to: "on" condition: - condition: or conditions: - condition: state entity_id: input_select.mode state: Home action: - type: turn_on device_id: 6a7dd3564b68685d3dfbfbe0d2429237 entity_id: light.office_floor_lamp domain: light brightness_pct: 100 - type: turn_on device_id: 6b3ff3b7d4b89d95e2120316881e62ca entity_id: light.office_overhead domain: light brightness_pct: 100 - service: zwave_js.bulk_set_partial_config_parameters data: parameter: "16" value: 33884694 target: entity_id: - light.office_floor_lamp - light.office_overhead enabled: false - service: light.turn_on data: effect: Twinkle target: entity_id: - light.esp07_light_1 - light.esp07_light_2 - light.esp07_light_3 - light.esp07_light_4 mode: single ``` ### Anything in the logs that might be useful for us? _No response_ ### Additional information Watching several other issues that have been opened and closed: https://github.com/home-assistant/core/issues/97965 https://github.com/home-assistant/core/issues/97768 https://github.com/home-assistant/core/issues/97721 https://github.com/home-assistant/core/issues/97662 https://github.com/home-assistant/core/issues/97581 https://community.home-assistant.io/t/already-running-new-automation-bug/596654/16

github.com/home-assistant/core

Automation: keeps running state

opened 08:24AM - 12 Aug 23 UTC

ridderr

### The problem I see regularly some automation which stays in running state. …### What version of Home Assistant Core has the issue? Home Assistant 2023.8.2 ### What was the last working version of Home Assistant Core? _No response_ ### What type of installation are you running? Home Assistant OS ### Integration causing the issue automation ### Link to integration documentation on our website _No response_ ### Diagnostics information [trace automation.lampen_gang 2023-08-10T17_31_32.136904+00_00.jso.txt](https://github.com/home-assistant/core/files/12326910/trace.automation.lampen_gang.2023-08-10T17_31_32.136904%2B00_00.jso.txt) [trace automation.zonsondergang 2023-08-12T07_01_51.042086+00_00.json.txt](https://github.com/home-assistant/core/files/12326911/trace.automation.zonsondergang.2023-08-12T07_01_51.042086%2B00_00.json.txt) [lampen-gang-automation.txt](https://github.com/home-assistant/core/files/12326920/lampen-gang-automation.txt) [zonsondergang.txt](https://github.com/home-assistant/core/files/12326921/zonsondergang.txt) ### Example YAML snippet ```yaml See attached files of two different automations ``` ### Anything in the logs that might be useful for us? ```txt It's not possible to disable/enable the automations. A restart is required. I have see below error but not sure if it's related: Logger: homeassistant.helpers.service Source: helpers/service.py:833 First occurred: 21:17:21 (2 occurrences) Last logged: 21:23:21 Referenced entities automation.lampen_gang are missing or not currently available ``` ### Additional information _No response_

And look at the last few posts here: Automation keeps in "still running ..." state - #21 by scottconnor

Its clear that there is something wrong with HA / z-wave since july

jumon · August 15, 2023, 12:33am

Yes! Those are exactly the automations that are all hanging, zwave.

Alain_Raymond · August 15, 2023, 12:34am

z-wave js or z-wave js ui?

jumon · August 15, 2023, 12:35am

Zwave js ui for me.

Alain_Raymond · August 15, 2023, 12:48am

So Z-wave JS ui seems to be the common point of this problem for all of us. Try restarting z-wave js ui instead of HA next time, I bet you’ll have the same results as me (automation will finally fail and leave the ‘‘still running’’ state).

smugleafdev · August 15, 2023, 2:07am

Yes! All my automations include Z-Wave in some form or another.

jumon · August 17, 2023, 12:37pm

Ok, it was locked up again this morning with automations not running due to ones already running. I restarted Zwave JS UI and then they worked. I was thinking that its probably a small community of people having this issue if we are the only ones talking about it. Do you happen to have backups enabled in Zwave JS UI? I do and have now disabled them to see if that might be causing the issue like it does with 700 series controllers (I have a 500).

Mariusthvdb · August 17, 2023, 1:50pm

considering this creates to pairs of from/to states to trigger, would that also imply we wouldn’t need the

       {{trigger.to_state.state != trigger.from_state.state}}

condition?

I hope this way we can also prevent triggering on unavailable/unkown (reload binary templates)

petro · August 17, 2023, 1:51pm

I’m not sure, I’ve never tried what I suggested. I’m assuming it works. I havent’ updated my automation in years and I favor templates over yaml.

Mariusthvdb · August 17, 2023, 1:55pm

Haha, ok well, we will see, Ive adapted a less important automation to do this:

  - id: dark_outside_sets_outside_motion_sensors
    trigger:
      platform: state
      entity_id: binary_sensor.donker_buiten
      from:
        - 'off'
        - 'on'
      to:
        - 'off'
        - 'on'
#       not_from: &un
#         - unavailable
#         - unknown
#       not_to: *un
#     condition:
#       >
#        {{trigger.to_state.state != trigger.from_state.state}}
    action:
      - service: >
          switch.turn_{{trigger.to_state.state}}
#{{states('binary_sensor.donker_buiten')}}
        target:
          entity_id: switch.buiten_motion_sensor_switches

hoping those triggers are paired (I never ‘read’ that in the docs)

petro · August 17, 2023, 1:56pm

well you shouldn’t ever get on → on triggers in general. If you do, then you might need to keep that template.

Mariusthvdb · August 17, 2023, 1:58pm

i have those templates mainly because in the old days those state triggers fired off the attributes only, and this prevents that. (The binary is a bad example, but think phones staying at home, but changing battery)

I can test in some other places too.

raman325 · August 17, 2023, 6:19pm

hi all, apologies for potentially hijacking this thread, but for people experiencing this problem with automations involving zwave devices, I’d like to look into whether or not the Z-Wave JS integration or driver is somehow contributing to this problem.

I’ve already reviewed the Z-Wave JS PRs introduced in 2023.7 and I don’t think this is newly introduced behavior, but rather the automation changes introduced in 2023.7 may have exposed an existing issue with zwave-js that was previously hidden from users (and us devs) because HA would stop waiting for the service call to complete after 10 seconds (it no longer does this)

If you’d like to help, please provide the following:

Automation YAML definition
Automation trace ideally, but if that’s not possible because the automation never finishes, an indication of what step in the definition the automation run is hanging on
Debug level zwave_js integration logs
Debug level zwave-js-server-python library logs
Debug level zwave-js driver logs (this is the addon logs for Z-Wave addon users, the Docker container logs for zwave-js-server or zwave-js-ui for bare Docker users, or zwave-js-server logs for the people running the server on the command line)

While I realize there isn’t much information here, this section of the docs may help you in obtaining the driver logs: Z-Wave - Home Assistant

For the integration and library logs, you can update your HA configuration, or use the services listed here: Logger - Home Assistant

For any additional help in obtaining the logs, please ask in the Discord #zwave channel

If you can’t publish this info here, you can open a GitHub issue, or you can DM me on discord (same username). Thanks!

jumon · August 17, 2023, 8:10pm

Thanks for offering to look at this! I’m currently waiting to see if my last change makes any diff and if it locks again, I’ll setup all of this. So I don’t make a mistake, what exact logging entries would work best in the configuration.yaml for integration and library debug logs? Also, for the driver logs, I would assume open that when occuring and keep that window open for how long? Thanks!!!

123 · August 17, 2023, 8:52pm

You may wish to contact allenporter who is currently investigating several open Issues that have officially reported the problem. In addition, the problem isn’t limited to entities based on the Zwave integration and has also been reported for ZHA.

It’s likely that it’s not limited to those two integrations (they happen to be very common, widely-used integrations) and is likely to happen for any integration that may take an unusually long time to respond to a service call. Unlike in the past, the automation now waits forever for a reply (to a command, like a service call). Naturally this will cause a problem for a mode: single automation that’s triggered while it’s still waiting endlessly for a response.

It even causes problems for mode: restart automations which, curiously, don’t restart but attempt to queue subsequent execution requests (the value of their current attribute increases above 1).

Users have reported that changing to mode: queued “fixes” the problem but it, in fact, only masks it. The previous ‘stuck’ instance is left waiting forever and a new instance handles the latest execution request.

The addition of continue_on_error: true (to each service call that runs the risk of not receiving a prompt reply) has been reported to prevent waiting forever (i.e. effectively it recognizes the lack of a prompt reply is abnormal, ceases waiting, and proceeds to execute the next action).

It now waits forever and has effectively exposed certain situations (no prompt response from a service call) that may have occurred in past versions but weren’t reported so the user was unaware that a problem existed.

So maybe, in a sort of backhanded way, this is a ‘good thing’ because it’s revealing a deficiency that was hidden in the past.