This has been very useful for me. Thanks. I was able to correct several of the log errors.
There are a few related to the MyQ integration but the related automations seem to work and the errors may be because the wifi in the garage is not very stable.
Other errors have only 1 or 2 occurences, so for the time being I will not worry too much about these but I will keep observing.
Other errors I do not understand and, per your suggestion, I will make a separate post in the forum.
Thanks again. I very much appreciate.
I checked all of my logs for the last 7 days as I swapped between different versions of core HA and don’t have a single Z-Wave error 200.
I have two errors:
zwave_js_server.exceptions.<mark>FailedZWave</mark>Command: Z-Wave error 202: The node did not respond after 1 attempts, it is presumed dead (ZW0202)
and
zwave_js_server.exceptions.<mark>FailedZWave</mark>Command: Z-Wave error 1405: The node failed to decode the message. (ZW1405)
But only 1 occurrence of each over the last 7 days.
At least in my persistent automation issues stacking up over time with more and more automation getting stuck - it’s unrelated to this ZW200 error.
I’ll open a new thread or bug report on this.
I’m not sure if I just came across the same issue. I’m using ZHA.
I have a light that is turned on either by the front door being opened, or a motion sensor in the same hall being triggered.
I noted this morning that the light had been turned on by the door being opened, but never turned off again until I walked through the hall the next morning:
But I cannot see anyway that the light should not have turned off (I do not have a separate on/off automation in this case).
alias: Hall Lights - Ground Floor v2
description: ""
trigger:
- platform: state
entity_id:
- binary_sensor.lumi_lumi_sensor_magnet_aq2_opening
to: "on"
id: frontdoor
- platform: state
entity_id:
- binary_sensor.lumi_lumi_motion_ac02_motion_iaszone
to: "on"
id: sensor.lumi_lumi_motion_ac02_illuminance_3
- platform: state
entity_id:
- binary_sensor.aqara_motion_ground_stairs_motion_iaszone
to: "on"
id: sensor.aqara_motion_ground_stairs_illuminance
condition:
- condition: template
value_template: >-
{%if trigger.id == 'frontdoor'%} {{state_attr('sun.sun','elevation')<-2.0}}
{%else%}
{{states(trigger.id)|int(999)<states('input_number.hall_illuminance_threshold')|int(0)}}
{%endif%}
action:
- service: light.turn_on
data:
brightness_pct: 100
kelvin: 3500
transition: 1
target:
entity_id: light.moes_rgb_bulb_ground_floor_hall_light
enabled: true
- wait_template: "{{ is_state(trigger.entity_id, 'off')}}"
continue_on_timeout: true
timeout: "30"
- delay:
hours: 0
minutes: 0
seconds: 30
milliseconds: 0
- service: light.turn_off
data: {}
target:
entity_id:
- light.moes_rgb_bulb_ground_floor_hall_light
mode: restart
UPDATE: Ignore me. Looks like it was the device itself. This was in the logs.
Error while executing automation automation.hall_lights_top_floor_v2: Failed to send request: device did not respond
Not sure why tho.,… everything looked stable at the time
Same here,
I am using Z2M.Dresden Conbee II
Some Aqara 3 Band switches can not trigger some automations anymore.
I use the Blueprint from mozartbanging.
Example:
Button one triggers two lights behind two shelly one → Works fine (if I choose only one of them its not working)
Button two triggers a Garage Door behind a shelly one is not working
Both use toggle.switch
Button 3 Triggers 2 Zigbee lights is working fine
using toggle.light
Button three triggers a shelly 2,5 as a cover switch it is not working
using toggle. cover
If i trigger the the units with a Philips Tap Dial everything is fine.
It feels if it is a Aqara/Mozartbanging triggering a shelly it doesn`t work, if there are two shellys it works.
I have no idea why this happens after the 2023.8 update
alias: Aqara 6fach Eingang
description: ""
use_blueprint:
path: >-
mozartbanging/zigbee2mqtt-aqara-opple-wxcjkg13lm-3-band-switch-all-custom-buttons.yaml
input:
switch: sensor.aqara_6fach_eingang_action
button_1_single:
- service: switch.turn_on
data: {}
target:
device_id: 2d543e2b060ffdd5b7afece5e5394db3
button_2_single:
- service: cover.toggle
data: {}
target:
device_id: c0dc50fc350e8edded19538d5dc91c82
button_3_single:
- service: light.turn_on
data: {}
target:
entity_id: light.hoflicht
button_4_single:
- service: light.turn_off
data: {}
target:
entity_id: light.hoflicht
button_5_single:
- service: switch.turn_on
data: {}
target:
entity_id: switch.aussenbel_garage
button_6_single:
- service: switch.turn_off
data: {}
target:
entity_id: switch.aussenbel_garage
I Got it, after reading this post again, Mozartbanging has to correct the Blueprint.
Mode: restart to Mode:single
I looked always in my yaml configuration but not in the Blueprint
After changing the blueprint and reconfiguring everthing works like before
Can you pls point me to this Blueprint ?
here it is
I have a script that has different error-handling behavior on 2023.8 than it did in any previous release (I keep my instance on the latest release). I’m wondering if this is helpful and maybe related to the issues that are being reported with automations. My script is a very simple bedtime routine to shut the garage doors, turn lights off, lock the door, and arm the alarm:
alias: Bedtime
sequence:
- service: cover.close_cover
target:
entity_id:
- cover.large_garage_door
- cover.small_garage_door
data: {}
- service: light.turn_off
target:
entity_id:
- light.basement_dimmer
- light.dining_room_lamp
- light.living_room_dimmer
- light.master_bedroom_dimmer
data: {}
- device_id: 5fbb6c0df792e1e2e57c21126234baa2
domain: lock
entity_id: lock.front_door_lock
type: lock
- delay:
hours: 0
minutes: 0
seconds: 30
milliseconds: 0
- service: alarmo.arm
data:
entity_id: alarm_control_panel.master
code: "0000"
mode: home
mode: single
icon: mdi:sleep
The interesting thing is that I have been using this script for probably over a year now. However, about 3 months ago I redid my zwave network with a new controller. Only very early in my HA journey did I ever use the “device” action in scripts or automations; I quickly moved to only use service calls. So when I redid my zwave network recently, I made sure all my entity_id’s were unchanged so to not affect my scripts and automations. I thought I didn’t have any references anywhere to device_id’s so I didn’t worry about those changing.
So what is wrong with the script above is that (starting a few months ago) the device_id became incorrect (after I redid the zwave network) and it no longer matched any of my devices. However the script continued to do everything else, it just obviously didn’t lock the door. I never noticed that anything was wrong because I normally lock it by hand before I go to bed, so not hearing it lock was normal. However tonight I called the script and noticed that a light didn’t turn off. As I debugged the issue I discovered the reference to the device_id. I also noticed that the script showed it was last triggered yesterday, even though I called it about a half dozen times today trying to debug it.
So pre-2023.8, the script would skip by that wrong device_id and execute the lines afterwards. Post-2023.8, the script hangs and never finishes. I believe, in my case, it’s been hanging since it was triggered yesterday, which would have been the first time it was triggered while running 2023.8.
Again I’m not sure if this has anything to do with the issues being reported, or perhaps it’s an expected change and entirely unrelated?
Obviously I know how to fix my issue, but posting here in case this is a piece of the puzzle.
2:50 am: Our dog woke us up and I realized that again no Zwave device was working.
A restart HA did not work either and only a full reboot of my Yellow Box resolved the issue.
There was nothing relevant in the log and there is no record of the attempted running of Automations in Traces from before reboot.
I’ve had one of my automatons hanging at seemingly random instances of execution since sometime in July. Don’t know if it was 2023.7.3 specifically. It doesn’t seem to matter if those mode is set to Single or Repeat. Pretty sure even Queued will show the same issue.
Example showing the UI config and traces (yaml below):
he first two actions each contain multiple media players as noted in the screen shot. The next two actions each call the turn off service for a group of switches and a group of lights respectively.
Example of stalling before the automation performs any actions at all:
In other instances it might perform the first two or three only.
This shows it performing all 4 defined actions:
Here’s the YAML:
alias: All Off (Lights Off, Media Off)
description: ""
trigger:
- platform: device
device_id: f2bb4aefe97f7cfe4ae4fa2439bb99f0
domain: zwave_js
type: event.value_notification.central_scene
property: scene
property_key: "002"
endpoint: 0
command_class: 91
subtype: Endpoint 0 Scene 002
value: 4
id: multiclick.porchlight.switch
alias: Porch Light 3-click OFF
- platform: device
device_id: ece2753762561b8ce22c4edf0b55d2f7
domain: zwave_js
type: event.value_notification.central_scene
property: scene
property_key: "002"
endpoint: 0
command_class: 91
subtype: Endpoint 0 Scene 002
value: 4
id: multiclick.masterbedroomlight.switch
alias: Master Bedroom Light 3-click OFF
condition: []
action:
- service: media_player.turn_off
data: {}
target:
device_id:
- 286a0c072e714beb765740e83172bca7
- 34d73f1675ca11c3d014adbe176a7883
- f4824c52dbb06f870df55f9b1386b135
- 90e618b723cd01740ff6441786b8df5c
- 5caca3609799bbc5de232b4bcc132a43
enabled: true
- service: media_player.media_stop
data: {}
target:
device_id:
- 219d41d475a8e486f127be236fd85436
- 5dc20823937dfc5e99efab860731d0af
- cc8a0b7a50743ab3162a9a7a49dd8ec7
enabled: true
- service: switch.turn_off
data: {}
target:
entity_id:
- switch.all_switch_entities
enabled: true
- service: light.turn_off
data: {}
target:
entity_id:
- light.all_light_entities
enabled: true
mode: restart
Here’s an example from earlier tonight that seems to have eventually timed some 30 minutes later out to an error:
Something is causing your “all lights” to take forever to shut off. Most likely a communication issue. The fact that your switches before the all lights have a significant delay between each turn off is alarming. For what it’s worth, I have about 80 or so switches & lights and I can turn them all off in less than a second. I’m running both Zwave and Zigbee.
Your trace shows 1 switch turning off, a 1 second delay, another switch turning off, a 2 second delay, 3 switches turning off, then an error. Looking at both traces, your failing trace appears to fail at the all lights, so I’d start looking into issues with those devices. The failing service is taking 32 minutes to fail.
This merits reporting as an Issue in Home Assistant’s GitHub Core repository. That’s where developers track and solve software bugs.
The ‘script hangs and never finishes’ behavior isn’t normal; it shouldn’t wait forever to receive a reply from an unresponsive device yet that seems to be the common theme for ‘stuck’ scripts and automations.
-
The problem has been reported to be evident when
mode
issingle
orrestart
. -
The problem appears to be masked when
mode
isqueued
orparallel
because another instance is started while the ‘stuck’ instance is left to fester.
That was my experience in the past. All switches/lights being Z-Wave, a few Shelly and a couple of Zigbee.
Nothing’s changed on this end other than keeping up with HA updates. I haven’t added any devices into those groups since sometime last year. The only devices I’ve even played with in the past few months have been Zigbee motion and door sensors which aren’t referenced here at all and work fine.
According to the trace the All Lights Off action isn’t called at all. In fact, I disabled the lights action and tested this again, and the same thing happened, “still running”
In diagnosing these issues I’ve re-ordered the actions around and any one or more of them can fail to be called. For example, when I had this set up originally it ran for over a year with the actions in this order: switches, lights, media players, more media players.
When the problems started I noticed a couple of switches/lights not turning off and either one or both of the media player off actions never being called (like the Lights in the image above)
I’ve played around with re-ordering the lights and switches OFF actions (disabling the media player actions), as well as editing each group to selectively remove all Z_Wave, or all Zigbee. Each of those seems to potentially have their own issues, but the delays between actions don’t seem dependent on which type is enabled or how many entities are contained in each.
For example, I had only switches going and it was completing in something like 0.24 seconds. With only lights going it was under 1 second. Put them both together with switches first, and 5.5 seconds. Swap the two actions around so lights come before switches (and change nothing else) and boom. 0.47 seconds.
There’s something wrong with the automation system - too many people noticed an issue at one of the July updates.
There’s definitely something wrong with Z-Wave - I’m using Z-Wave-JS UI. Nodes going “unavailable” temporarily or dead requiring a ping isn’t an issue seen with other home automation platforms. I’m running the same hardware installed in the same places as I was with Indigo Domotics and I had never previously seen any node ever go dead nor unresponsive.
With multiple updates of many components every week, it’s really difficult to track this stuff down.
I’d wager that the issue stems from the zwave issues. You shouldn’t have to ping anything on a zwave network.
For what it’s worth, I built a template sensor that is monitoring my number of automations running cuncurrently, zero issues. Running zwave JS in the current version with 70 nodes. So it’s apparent (in my opinion) that this stems from your zwave issues.
Except I mentioned above having removed all Z-Wave entities from the groups, so no Z-Wave devices are called/touched during the automation, yet it can still report 5+ seconds and strange behavior when changing the order of the group off actions.
I don’t doubt there’s a possibility of multiple (compounded) issues especially in what was seen originally, but Z-Wave can safely be eliminated in the last tests I performed.
well, the only pure way to tell if it’s “automations” would be to remove all hardware from the equation, like using input booleans. Something that doesn’t require a callback that lets the system know the service went through. Or template switches & lights.
I opened another issue:
To track my own experiences. I can swap to any version of 2023.7 or 2023.8, and my automations get stuck within minutes of finishing the version switch. Reverting back to 2023.6.*, and everything is working fine.
Seems to rule out hardware and Z-wave devices itself.
In addition to the log files you attached, add a trace file so the development team can inspect all of the details of the automation’s execution, right up to the action that causes it to wait indefinitely (the trace file contains more information than what’s shown in the node diagram).
I just went to reproduce the issue so that I could create an issue, but I am not able to get it to hang. I have the script with the wrong ‘device_id’ but it skips right over that step now. I no longer understand why the script was hanging. If I can manage to reproduce the problem I will report it.