because light.turn_unavailable or light.turn_unknown are not valid services and will cause the automation to deactivate if your binary sensor decided to disconnect.
Secondly, that’s using the state machine which may not be in sync with your actual trigger, which is why mine is using the trigger object over states(..)
yes, they ensure your states will only trigger when you move from valid states like on and off to on and off. i.e. unknown or unavailable will not impact the automation.
I’ve been having the same issue here, many automations stop working and when I try to manually run, the logs show it is already running but it should not be. Its like the automations are hanging. Not all automations are affected, but many are and they work fine the rest of the time. I cant even restart HA, that wont work, I have to reboot the RPI its running on (RPI4 8GB). I just updated from 2023.7.3 to 2023.8.2 and will see if this happens again. Unfortunately, it takes several days before it locks up like this and always seems to start in the middle of the night.
I am also seeing this issue in 2023.7.3.
Several automations seem to be hanging, and I am seeing warnings with "Already running"in the logs.
Updating to 2023.8.2 now.
I’m also having this problem, there are also a few issues on github about this.
I found that the automations that are problematic on my side (and that were working perfectly fine for months before) are the one using z-wave devices.
I can’t disable and re-enable the automation to reset it, it won’t disable at all.
Instead of rebooting HA, I tried rebooting the Z-wave JS UI addon. When rebooting the addon, the automation will finally fails and leave the ‘‘already running / hanging’’ state.
I’m pretty sure this issue is related to some change introduced in 2023.7 or some change in Z-wave JS UI that were introduced at the same time 2023.7 was released.
Can you confirm if your hanging automations are also using z-wave devices and that you are also using Z-wave JS UI?
So Z-wave JS ui seems to be the common point of this problem for all of us. Try restarting z-wave js ui instead of HA next time, I bet you’ll have the same results as me (automation will finally fail and leave the ‘‘still running’’ state).
Ok, it was locked up again this morning with automations not running due to ones already running. I restarted Zwave JS UI and then they worked. I was thinking that its probably a small community of people having this issue if we are the only ones talking about it. Do you happen to have backups enabled in Zwave JS UI? I do and have now disabled them to see if that might be causing the issue like it does with 700 series controllers (I have a 500).
i have those templates mainly because in the old days those state triggers fired off the attributes only, and this prevents that. (The binary is a bad example, but think phones staying at home, but changing battery)
hi all, apologies for potentially hijacking this thread, but for people experiencing this problem with automations involving zwave devices, I’d like to look into whether or not the Z-Wave JS integration or driver is somehow contributing to this problem.
I’ve already reviewed the Z-Wave JS PRs introduced in 2023.7 and I don’t think this is newly introduced behavior, but rather the automation changes introduced in 2023.7 may have exposed an existing issue with zwave-js that was previously hidden from users (and us devs) because HA would stop waiting for the service call to complete after 10 seconds (it no longer does this)
If you’d like to help, please provide the following:
Automation YAML definition
Automation trace ideally, but if that’s not possible because the automation never finishes, an indication of what step in the definition the automation run is hanging on
Debug level zwave_js integration logs
Debug level zwave-js-server-python library logs
Debug level zwave-js driver logs (this is the addon logs for Z-Wave addon users, the Docker container logs for zwave-js-server or zwave-js-ui for bare Docker users, or zwave-js-server logs for the people running the server on the command line)
While I realize there isn’t much information here, this section of the docs may help you in obtaining the driver logs: Z-Wave - Home Assistant
For the integration and library logs, you can update your HA configuration, or use the services listed here: Logger - Home Assistant
For any additional help in obtaining the logs, please ask in the Discord #zwave channel
If you can’t publish this info here, you can open a GitHub issue, or you can DM me on discord (same username). Thanks!
Thanks for offering to look at this! I’m currently waiting to see if my last change makes any diff and if it locks again, I’ll setup all of this. So I don’t make a mistake, what exact logging entries would work best in the configuration.yaml for integration and library debug logs? Also, for the driver logs, I would assume open that when occuring and keep that window open for how long? Thanks!!!
You may wish to contact allenporter who is currently investigating several open Issues that have officially reported the problem. In addition, the problem isn’t limited to entities based on the Zwave integration and has also been reported for ZHA.
It’s likely that it’s not limited to those two integrations (they happen to be very common, widely-used integrations) and is likely to happen for any integration that may take an unusually long time to respond to a service call. Unlike in the past, the automation now waits forever for a reply (to a command, like a service call). Naturally this will cause a problem for a mode: single automation that’s triggered while it’s still waiting endlessly for a response.
It even causes problems for mode: restart automations which, curiously, don’t restart but attempt to queue subsequent execution requests (the value of their current attribute increases above 1).
Users have reported that changing to mode: queued “fixes” the problem but it, in fact, only masks it. The previous ‘stuck’ instance is left waiting forever and a new instance handles the latest execution request.
The addition of continue_on_error: true (to each service call that runs the risk of not receiving a prompt reply) has been reported to prevent waiting forever (i.e. effectively it recognizes the lack of a prompt reply is abnormal, ceases waiting, and proceeds to execute the next action).
It now waits forever and has effectively exposed certain situations (no prompt response from a service call) that may have occurred in past versions but weren’t reported so the user was unaware that a problem existed.
So maybe, in a sort of backhanded way, this is a ‘good thing’ because it’s revealing a deficiency that was hidden in the past.