"Already running" New automation bug?

Alain_Raymond · August 15, 2023, 12:34am

z-wave js or z-wave js ui?

jumon · August 15, 2023, 12:35am

Zwave js ui for me.

Alain_Raymond · August 15, 2023, 12:48am

So Z-wave JS ui seems to be the common point of this problem for all of us. Try restarting z-wave js ui instead of HA next time, I bet you’ll have the same results as me (automation will finally fail and leave the ‘‘still running’’ state).

smugleafdev · August 15, 2023, 2:07am

Yes! All my automations include Z-Wave in some form or another.

jumon · August 17, 2023, 12:37pm

Ok, it was locked up again this morning with automations not running due to ones already running. I restarted Zwave JS UI and then they worked. I was thinking that its probably a small community of people having this issue if we are the only ones talking about it. Do you happen to have backups enabled in Zwave JS UI? I do and have now disabled them to see if that might be causing the issue like it does with 700 series controllers (I have a 500).

Mariusthvdb · August 17, 2023, 1:50pm

considering this creates to pairs of from/to states to trigger, would that also imply we wouldn’t need the

       {{trigger.to_state.state != trigger.from_state.state}}

condition?

I hope this way we can also prevent triggering on unavailable/unkown (reload binary templates)

petro · August 17, 2023, 1:51pm

I’m not sure, I’ve never tried what I suggested. I’m assuming it works. I havent’ updated my automation in years and I favor templates over yaml.

Mariusthvdb · August 17, 2023, 1:55pm

Haha, ok well, we will see, Ive adapted a less important automation to do this:

  - id: dark_outside_sets_outside_motion_sensors
    trigger:
      platform: state
      entity_id: binary_sensor.donker_buiten
      from:
        - 'off'
        - 'on'
      to:
        - 'off'
        - 'on'
#       not_from: &un
#         - unavailable
#         - unknown
#       not_to: *un
#     condition:
#       >
#        {{trigger.to_state.state != trigger.from_state.state}}
    action:
      - service: >
          switch.turn_{{trigger.to_state.state}}
#{{states('binary_sensor.donker_buiten')}}
        target:
          entity_id: switch.buiten_motion_sensor_switches

hoping those triggers are paired (I never ‘read’ that in the docs)

petro · August 17, 2023, 1:56pm

well you shouldn’t ever get on → on triggers in general. If you do, then you might need to keep that template.

Mariusthvdb · August 17, 2023, 1:58pm

i have those templates mainly because in the old days those state triggers fired off the attributes only, and this prevents that. (The binary is a bad example, but think phones staying at home, but changing battery)

I can test in some other places too.

raman325 · August 17, 2023, 6:19pm

hi all, apologies for potentially hijacking this thread, but for people experiencing this problem with automations involving zwave devices, I’d like to look into whether or not the Z-Wave JS integration or driver is somehow contributing to this problem.

I’ve already reviewed the Z-Wave JS PRs introduced in 2023.7 and I don’t think this is newly introduced behavior, but rather the automation changes introduced in 2023.7 may have exposed an existing issue with zwave-js that was previously hidden from users (and us devs) because HA would stop waiting for the service call to complete after 10 seconds (it no longer does this)

If you’d like to help, please provide the following:

Automation YAML definition
Automation trace ideally, but if that’s not possible because the automation never finishes, an indication of what step in the definition the automation run is hanging on
Debug level zwave_js integration logs
Debug level zwave-js-server-python library logs
Debug level zwave-js driver logs (this is the addon logs for Z-Wave addon users, the Docker container logs for zwave-js-server or zwave-js-ui for bare Docker users, or zwave-js-server logs for the people running the server on the command line)

While I realize there isn’t much information here, this section of the docs may help you in obtaining the driver logs: Z-Wave - Home Assistant

For the integration and library logs, you can update your HA configuration, or use the services listed here: Logger - Home Assistant

For any additional help in obtaining the logs, please ask in the Discord #zwave channel

If you can’t publish this info here, you can open a GitHub issue, or you can DM me on discord (same username). Thanks!

jumon · August 17, 2023, 8:10pm

Thanks for offering to look at this! I’m currently waiting to see if my last change makes any diff and if it locks again, I’ll setup all of this. So I don’t make a mistake, what exact logging entries would work best in the configuration.yaml for integration and library debug logs? Also, for the driver logs, I would assume open that when occuring and keep that window open for how long? Thanks!!!

123 · August 17, 2023, 8:52pm

You may wish to contact allenporter who is currently investigating several open Issues that have officially reported the problem. In addition, the problem isn’t limited to entities based on the Zwave integration and has also been reported for ZHA.

It’s likely that it’s not limited to those two integrations (they happen to be very common, widely-used integrations) and is likely to happen for any integration that may take an unusually long time to respond to a service call. Unlike in the past, the automation now waits forever for a reply (to a command, like a service call). Naturally this will cause a problem for a mode: single automation that’s triggered while it’s still waiting endlessly for a response.

It even causes problems for mode: restart automations which, curiously, don’t restart but attempt to queue subsequent execution requests (the value of their current attribute increases above 1).

Users have reported that changing to mode: queued “fixes” the problem but it, in fact, only masks it. The previous ‘stuck’ instance is left waiting forever and a new instance handles the latest execution request.

The addition of continue_on_error: true (to each service call that runs the risk of not receiving a prompt reply) has been reported to prevent waiting forever (i.e. effectively it recognizes the lack of a prompt reply is abnormal, ceases waiting, and proceeds to execute the next action).

It now waits forever and has effectively exposed certain situations (no prompt response from a service call) that may have occurred in past versions but weren’t reported so the user was unaware that a problem existed.

So maybe, in a sort of backhanded way, this is a ‘good thing’ because it’s revealing a deficiency that was hidden in the past.

jumon · August 17, 2023, 9:20pm

It even causes problems for mode: restart automations which, curiously, don’t restart but attempt to queue subsequent execution requests (the value of their current attribute increases above 1).

Ah, thanks for the info. I changed one of my automations to “restart” in the hope it would restart but seems thats probably not going to happen either. At least we can try to work on whatever is causing the lockups in the respective addons.

raman325 · August 17, 2023, 10:05pm

homeassistant.components.zwave_js and zwave-js-server-python would be the loggers you would want to set to debug

raman325 · August 17, 2023, 10:07pm

Agreed that the problem isn’t limited to entities from the Z-Wave integration. But if the theory is valid that service calls have been hanging indefinitely, it’s something that likely needs to be accommodated for in each integration, Z-Wave being one of them, and the primary one I can help with.

I am also curious what the behavior was before and if the command that was sent was successful and somehow the response just never came back. I would imagine that’s the case because otherwise people’s automations would have not had the desired effect even before this change and we would have seen feedback related to it.

smugleafdev · August 19, 2023, 2:44pm

After my extremely simple kitchen light automation failed six times yesterday, I’ve given up. I changed the mode of the automation to “restart” rather than single.

jumon · August 20, 2023, 1:51am

Ok, I just got more lockups so eliminating the backups is not the solution. I’ve enabled debug logging and will await the next lockup.

jumon · August 20, 2023, 11:41am

Ok, and one last pre-event question, what is the best way to get these logs and their location? I looked around in the standard terminal and the samba shares but could not find them. Do I need to setup the developer ssh access (just got that setup today) or get them off the SSD through another system? Trying to have minimal downtime when it happens again. Thanks!

raman325 · August 22, 2023, 3:23am

the server logs will be in the addon settings. You can also get the logs using docker commands when SSHing in but it’s probably easier to do it through the UI.

Integration and library logs will be in the homeassistant logs mixed in with the logs for other integrations