I’m not sure if I just came across the same issue. I’m using ZHA.
I have a light that is turned on either by the front door being opened, or a motion sensor in the same hall being triggered.
I noted this morning that the light had been turned on by the door being opened, but never turned off again until I walked through the hall the next morning:
Same here,
I am using Z2M.Dresden Conbee II
Some Aqara 3 Band switches can not trigger some automations anymore.
I use the Blueprint from mozartbanging.
Example:
Button one triggers two lights behind two shelly one → Works fine (if I choose only one of them its not working)
Button two triggers a Garage Door behind a shelly one is not working
Both use toggle.switch
Button 3 Triggers 2 Zigbee lights is working fine
using toggle.light
Button three triggers a shelly 2,5 as a cover switch it is not working
using toggle. cover
If i trigger the the units with a Philips Tap Dial everything is fine.
It feels if it is a Aqara/Mozartbanging triggering a shelly it doesn`t work, if there are two shellys it works.
I have no idea why this happens after the 2023.8 update
I Got it, after reading this post again, Mozartbanging has to correct the Blueprint.
Mode: restart to Mode:single
I looked always in my yaml configuration but not in the Blueprint
After changing the blueprint and reconfiguring everthing works like before
I have a script that has different error-handling behavior on 2023.8 than it did in any previous release (I keep my instance on the latest release). I’m wondering if this is helpful and maybe related to the issues that are being reported with automations. My script is a very simple bedtime routine to shut the garage doors, turn lights off, lock the door, and arm the alarm:
The interesting thing is that I have been using this script for probably over a year now. However, about 3 months ago I redid my zwave network with a new controller. Only very early in my HA journey did I ever use the “device” action in scripts or automations; I quickly moved to only use service calls. So when I redid my zwave network recently, I made sure all my entity_id’s were unchanged so to not affect my scripts and automations. I thought I didn’t have any references anywhere to device_id’s so I didn’t worry about those changing.
So what is wrong with the script above is that (starting a few months ago) the device_id became incorrect (after I redid the zwave network) and it no longer matched any of my devices. However the script continued to do everything else, it just obviously didn’t lock the door. I never noticed that anything was wrong because I normally lock it by hand before I go to bed, so not hearing it lock was normal. However tonight I called the script and noticed that a light didn’t turn off. As I debugged the issue I discovered the reference to the device_id. I also noticed that the script showed it was last triggered yesterday, even though I called it about a half dozen times today trying to debug it.
So pre-2023.8, the script would skip by that wrong device_id and execute the lines afterwards. Post-2023.8, the script hangs and never finishes. I believe, in my case, it’s been hanging since it was triggered yesterday, which would have been the first time it was triggered while running 2023.8.
Again I’m not sure if this has anything to do with the issues being reported, or perhaps it’s an expected change and entirely unrelated?
Obviously I know how to fix my issue, but posting here in case this is a piece of the puzzle.
2:50 am: Our dog woke us up and I realized that again no Zwave device was working.
A restart HA did not work either and only a full reboot of my Yellow Box resolved the issue.
There was nothing relevant in the log and there is no record of the attempted running of Automations in Traces from before reboot.
I’ve had one of my automatons hanging at seemingly random instances of execution since sometime in July. Don’t know if it was 2023.7.3 specifically. It doesn’t seem to matter if those mode is set to Single or Repeat. Pretty sure even Queued will show the same issue.
Example showing the UI config and traces (yaml below):
he first two actions each contain multiple media players as noted in the screen shot. The next two actions each call the turn off service for a group of switches and a group of lights respectively.
Example of stalling before the automation performs any actions at all:
Something is causing your “all lights” to take forever to shut off. Most likely a communication issue. The fact that your switches before the all lights have a significant delay between each turn off is alarming. For what it’s worth, I have about 80 or so switches & lights and I can turn them all off in less than a second. I’m running both Zwave and Zigbee.
Your trace shows 1 switch turning off, a 1 second delay, another switch turning off, a 2 second delay, 3 switches turning off, then an error. Looking at both traces, your failing trace appears to fail at the all lights, so I’d start looking into issues with those devices. The failing service is taking 32 minutes to fail.
This merits reporting as an Issue in Home Assistant’s GitHub Core repository. That’s where developers track and solve software bugs.
The ‘script hangs and never finishes’ behavior isn’t normal; it shouldn’t wait forever to receive a reply from an unresponsive device yet that seems to be the common theme for ‘stuck’ scripts and automations.
The problem has been reported to be evident when mode is single or restart.
The problem appears to be masked when mode is queued or parallel because another instance is started while the ‘stuck’ instance is left to fester.
That was my experience in the past. All switches/lights being Z-Wave, a few Shelly and a couple of Zigbee.
Nothing’s changed on this end other than keeping up with HA updates. I haven’t added any devices into those groups since sometime last year. The only devices I’ve even played with in the past few months have been Zigbee motion and door sensors which aren’t referenced here at all and work fine.
According to the trace the All Lights Off action isn’t called at all. In fact, I disabled the lights action and tested this again, and the same thing happened, “still running”
In diagnosing these issues I’ve re-ordered the actions around and any one or more of them can fail to be called. For example, when I had this set up originally it ran for over a year with the actions in this order: switches, lights, media players, more media players.
When the problems started I noticed a couple of switches/lights not turning off and either one or both of the media player off actions never being called (like the Lights in the image above)
I’ve played around with re-ordering the lights and switches OFF actions (disabling the media player actions), as well as editing each group to selectively remove all Z_Wave, or all Zigbee. Each of those seems to potentially have their own issues, but the delays between actions don’t seem dependent on which type is enabled or how many entities are contained in each.
For example, I had only switches going and it was completing in something like 0.24 seconds. With only lights going it was under 1 second. Put them both together with switches first, and 5.5 seconds. Swap the two actions around so lights come before switches (and change nothing else) and boom. 0.47 seconds.
There’s something wrong with the automation system - too many people noticed an issue at one of the July updates.
There’s definitely something wrong with Z-Wave - I’m using Z-Wave-JS UI. Nodes going “unavailable” temporarily or dead requiring a ping isn’t an issue seen with other home automation platforms. I’m running the same hardware installed in the same places as I was with Indigo Domotics and I had never previously seen any node ever go dead nor unresponsive.
With multiple updates of many components every week, it’s really difficult to track this stuff down.
I’d wager that the issue stems from the zwave issues. You shouldn’t have to ping anything on a zwave network.
For what it’s worth, I built a template sensor that is monitoring my number of automations running cuncurrently, zero issues. Running zwave JS in the current version with 70 nodes. So it’s apparent (in my opinion) that this stems from your zwave issues.
Except I mentioned above having removed all Z-Wave entities from the groups, so no Z-Wave devices are called/touched during the automation, yet it can still report 5+ seconds and strange behavior when changing the order of the group off actions.
I don’t doubt there’s a possibility of multiple (compounded) issues especially in what was seen originally, but Z-Wave can safely be eliminated in the last tests I performed.
well, the only pure way to tell if it’s “automations” would be to remove all hardware from the equation, like using input booleans. Something that doesn’t require a callback that lets the system know the service went through. Or template switches & lights.
To track my own experiences. I can swap to any version of 2023.7 or 2023.8, and my automations get stuck within minutes of finishing the version switch. Reverting back to 2023.6.*, and everything is working fine.
Seems to rule out hardware and Z-wave devices itself.
In addition to the log files you attached, add a trace file so the development team can inspect all of the details of the automation’s execution, right up to the action that causes it to wait indefinitely (the trace file contains more information than what’s shown in the node diagram).
I just went to reproduce the issue so that I could create an issue, but I am not able to get it to hang. I have the script with the wrong ‘device_id’ but it skips right over that step now. I no longer understand why the script was hanging. If I can manage to reproduce the problem I will report it.
+1 for me. It seems things work if I reboot prior to my 10PM automation running.
If it fails, I can open up the automation and run each step, one at a time, and they all work, so I doubt the automation is bad, especially since it was running find under 2023.7.