2023.8: Update breaks automations

I checked all of my logs for the last 7 days as I swapped between different versions of core HA and don’t have a single Z-Wave error 200.

I have two errors:

zwave_js_server.exceptions.<mark>FailedZWave</mark>Command: Z-Wave error 202: The node did not respond after 1 attempts, it is presumed dead (ZW0202)

and

zwave_js_server.exceptions.<mark>FailedZWave</mark>Command: Z-Wave error 1405: The node failed to decode the message. (ZW1405)

But only 1 occurrence of each over the last 7 days.

At least in my persistent automation issues stacking up over time with more and more automation getting stuck - it’s unrelated to this ZW200 error.

I’ll open a new thread or bug report on this.

I’m not sure if I just came across the same issue. I’m using ZHA.

I have a light that is turned on either by the front door being opened, or a motion sensor in the same hall being triggered.

I noted this morning that the light had been turned on by the door being opened, but never turned off again until I walked through the hall the next morning:

But I cannot see anyway that the light should not have turned off (I do not have a separate on/off automation in this case).

alias: Hall Lights - Ground Floor v2
description: ""
trigger:
  - platform: state
    entity_id:
      - binary_sensor.lumi_lumi_sensor_magnet_aq2_opening
    to: "on"
    id: frontdoor
  - platform: state
    entity_id:
      - binary_sensor.lumi_lumi_motion_ac02_motion_iaszone
    to: "on"
    id: sensor.lumi_lumi_motion_ac02_illuminance_3
  - platform: state
    entity_id:
      - binary_sensor.aqara_motion_ground_stairs_motion_iaszone
    to: "on"
    id: sensor.aqara_motion_ground_stairs_illuminance
condition:
  - condition: template
    value_template: >-
      {%if trigger.id == 'frontdoor'%} {{state_attr('sun.sun','elevation')<-2.0}}
      {%else%}
      {{states(trigger.id)|int(999)<states('input_number.hall_illuminance_threshold')|int(0)}}
      {%endif%}
action:
  - service: light.turn_on
    data:
      brightness_pct: 100
      kelvin: 3500
      transition: 1
    target:
      entity_id: light.moes_rgb_bulb_ground_floor_hall_light
    enabled: true
  - wait_template: "{{ is_state(trigger.entity_id, 'off')}}"
    continue_on_timeout: true
    timeout: "30"
  - delay:
      hours: 0
      minutes: 0
      seconds: 30
      milliseconds: 0
  - service: light.turn_off
    data: {}
    target:
      entity_id:
        - light.moes_rgb_bulb_ground_floor_hall_light
mode: restart

UPDATE: Ignore me. Looks like it was the device itself. This was in the logs.

Error while executing automation automation.hall_lights_top_floor_v2: Failed to send request: device did not respond

Not sure why tho.,… everything looked stable at the time

Same here,
I am using Z2M.Dresden Conbee II
Some Aqara 3 Band switches can not trigger some automations anymore.
I use the Blueprint from mozartbanging.
Example:
Button one triggers two lights behind two shelly one → Works fine (if I choose only one of them its not working)

Button two triggers a Garage Door behind a shelly one is not working
Both use toggle.switch

Button 3 Triggers 2 Zigbee lights is working fine
using toggle.light

Button three triggers a shelly 2,5 as a cover switch it is not working
using toggle. cover

If i trigger the the units with a Philips Tap Dial everything is fine.

It feels if it is a Aqara/Mozartbanging triggering a shelly it doesn`t work, if there are two shellys it works.

I have no idea why this happens after the 2023.8 update

alias: Aqara 6fach Eingang
description: ""
use_blueprint:
  path: >-
    mozartbanging/zigbee2mqtt-aqara-opple-wxcjkg13lm-3-band-switch-all-custom-buttons.yaml
  input:
    switch: sensor.aqara_6fach_eingang_action
    button_1_single:
      - service: switch.turn_on
        data: {}
        target:
          device_id: 2d543e2b060ffdd5b7afece5e5394db3
    button_2_single:
      - service: cover.toggle
        data: {}
        target:
          device_id: c0dc50fc350e8edded19538d5dc91c82
    button_3_single:
      - service: light.turn_on
        data: {}
        target:
          entity_id: light.hoflicht
    button_4_single:
      - service: light.turn_off
        data: {}
        target:
          entity_id: light.hoflicht
    button_5_single:
      - service: switch.turn_on
        data: {}
        target:
          entity_id: switch.aussenbel_garage
    button_6_single:
      - service: switch.turn_off
        data: {}
        target:
          entity_id: switch.aussenbel_garage

I Got it, after reading this post again, Mozartbanging has to correct the Blueprint.
Mode: restart to Mode:single
I looked always in my yaml configuration but not in the Blueprint :man_facepalming:
After changing the blueprint and reconfiguring everthing works like before

Can you pls point me to this Blueprint ?

here it is

1 Like

I have a script that has different error-handling behavior on 2023.8 than it did in any previous release (I keep my instance on the latest release). I’m wondering if this is helpful and maybe related to the issues that are being reported with automations. My script is a very simple bedtime routine to shut the garage doors, turn lights off, lock the door, and arm the alarm:

alias: Bedtime
sequence:
  - service: cover.close_cover
    target:
      entity_id:
        - cover.large_garage_door
        - cover.small_garage_door
    data: {}
  - service: light.turn_off
    target:
      entity_id:
        - light.basement_dimmer
        - light.dining_room_lamp
        - light.living_room_dimmer
        - light.master_bedroom_dimmer
    data: {}
  - device_id: 5fbb6c0df792e1e2e57c21126234baa2
    domain: lock
    entity_id: lock.front_door_lock
    type: lock
  - delay:
      hours: 0
      minutes: 0
      seconds: 30
      milliseconds: 0
  - service: alarmo.arm
    data:
      entity_id: alarm_control_panel.master
      code: "0000"
      mode: home
mode: single
icon: mdi:sleep

The interesting thing is that I have been using this script for probably over a year now. However, about 3 months ago I redid my zwave network with a new controller. Only very early in my HA journey did I ever use the “device” action in scripts or automations; I quickly moved to only use service calls. So when I redid my zwave network recently, I made sure all my entity_id’s were unchanged so to not affect my scripts and automations. I thought I didn’t have any references anywhere to device_id’s so I didn’t worry about those changing.

So what is wrong with the script above is that (starting a few months ago) the device_id became incorrect (after I redid the zwave network) and it no longer matched any of my devices. However the script continued to do everything else, it just obviously didn’t lock the door. I never noticed that anything was wrong because I normally lock it by hand before I go to bed, so not hearing it lock was normal. However tonight I called the script and noticed that a light didn’t turn off. As I debugged the issue I discovered the reference to the device_id. I also noticed that the script showed it was last triggered yesterday, even though I called it about a half dozen times today trying to debug it.

So pre-2023.8, the script would skip by that wrong device_id and execute the lines afterwards. Post-2023.8, the script hangs and never finishes. I believe, in my case, it’s been hanging since it was triggered yesterday, which would have been the first time it was triggered while running 2023.8.

Again I’m not sure if this has anything to do with the issues being reported, or perhaps it’s an expected change and entirely unrelated?

Obviously I know how to fix my issue, but posting here in case this is a piece of the puzzle.

1 Like

2:50 am: Our dog woke us up and I realized that again no Zwave device was working.
A restart HA did not work either and only a full reboot of my Yellow Box resolved the issue.
There was nothing relevant in the log and there is no record of the attempted running of Automations in Traces from before reboot.

I’ve had one of my automatons hanging at seemingly random instances of execution since sometime in July. Don’t know if it was 2023.7.3 specifically. It doesn’t seem to matter if those mode is set to Single or Repeat. Pretty sure even Queued will show the same issue.

Example showing the UI config and traces (yaml below):

he first two actions each contain multiple media players as noted in the screen shot. The next two actions each call the turn off service for a group of switches and a group of lights respectively.

Example of stalling before the automation performs any actions at all:

In other instances it might perform the first two or three only.

This shows it performing all 4 defined actions:

Here’s the YAML:

alias: All Off (Lights Off, Media Off)
description: ""
trigger:
  - platform: device
    device_id: f2bb4aefe97f7cfe4ae4fa2439bb99f0
    domain: zwave_js
    type: event.value_notification.central_scene
    property: scene
    property_key: "002"
    endpoint: 0
    command_class: 91
    subtype: Endpoint 0 Scene 002
    value: 4
    id: multiclick.porchlight.switch
    alias: Porch Light 3-click OFF
  - platform: device
    device_id: ece2753762561b8ce22c4edf0b55d2f7
    domain: zwave_js
    type: event.value_notification.central_scene
    property: scene
    property_key: "002"
    endpoint: 0
    command_class: 91
    subtype: Endpoint 0 Scene 002
    value: 4
    id: multiclick.masterbedroomlight.switch
    alias: Master Bedroom Light 3-click OFF
condition: []
action:
  - service: media_player.turn_off
    data: {}
    target:
      device_id:
        - 286a0c072e714beb765740e83172bca7
        - 34d73f1675ca11c3d014adbe176a7883
        - f4824c52dbb06f870df55f9b1386b135
        - 90e618b723cd01740ff6441786b8df5c
        - 5caca3609799bbc5de232b4bcc132a43
    enabled: true
  - service: media_player.media_stop
    data: {}
    target:
      device_id:
        - 219d41d475a8e486f127be236fd85436
        - 5dc20823937dfc5e99efab860731d0af
        - cc8a0b7a50743ab3162a9a7a49dd8ec7
    enabled: true
  - service: switch.turn_off
    data: {}
    target:
      entity_id:
        - switch.all_switch_entities
    enabled: true
  - service: light.turn_off
    data: {}
    target:
      entity_id:
        - light.all_light_entities
    enabled: true
mode: restart

Here’s an example from earlier tonight that seems to have eventually timed some 30 minutes later out to an error:

Something is causing your “all lights” to take forever to shut off. Most likely a communication issue. The fact that your switches before the all lights have a significant delay between each turn off is alarming. For what it’s worth, I have about 80 or so switches & lights and I can turn them all off in less than a second. I’m running both Zwave and Zigbee.

Your trace shows 1 switch turning off, a 1 second delay, another switch turning off, a 2 second delay, 3 switches turning off, then an error. Looking at both traces, your failing trace appears to fail at the all lights, so I’d start looking into issues with those devices. The failing service is taking 32 minutes to fail.

This merits reporting as an Issue in Home Assistant’s GitHub Core repository. That’s where developers track and solve software bugs.


The ‘script hangs and never finishes’ behavior isn’t normal; it shouldn’t wait forever to receive a reply from an unresponsive device yet that seems to be the common theme for ‘stuck’ scripts and automations.

  • The problem has been reported to be evident when mode is single or restart.

  • The problem appears to be masked when mode is queued or parallel because another instance is started while the ‘stuck’ instance is left to fester.

That was my experience in the past. All switches/lights being Z-Wave, a few Shelly and a couple of Zigbee.

Nothing’s changed on this end other than keeping up with HA updates. I haven’t added any devices into those groups since sometime last year. The only devices I’ve even played with in the past few months have been Zigbee motion and door sensors which aren’t referenced here at all and work fine.

According to the trace the All Lights Off action isn’t called at all. In fact, I disabled the lights action and tested this again, and the same thing happened, “still running”

In diagnosing these issues I’ve re-ordered the actions around and any one or more of them can fail to be called. For example, when I had this set up originally it ran for over a year with the actions in this order: switches, lights, media players, more media players.

When the problems started I noticed a couple of switches/lights not turning off and either one or both of the media player off actions never being called (like the Lights in the image above)

I’ve played around with re-ordering the lights and switches OFF actions (disabling the media player actions), as well as editing each group to selectively remove all Z_Wave, or all Zigbee. Each of those seems to potentially have their own issues, but the delays between actions don’t seem dependent on which type is enabled or how many entities are contained in each.

For example, I had only switches going and it was completing in something like 0.24 seconds. With only lights going it was under 1 second. Put them both together with switches first, and 5.5 seconds. Swap the two actions around so lights come before switches (and change nothing else) and boom. 0.47 seconds.

There’s something wrong with the automation system - too many people noticed an issue at one of the July updates.

There’s definitely something wrong with Z-Wave - I’m using Z-Wave-JS UI. Nodes going “unavailable” temporarily or dead requiring a ping isn’t an issue seen with other home automation platforms. I’m running the same hardware installed in the same places as I was with Indigo Domotics and I had never previously seen any node ever go dead nor unresponsive.

With multiple updates of many components every week, it’s really difficult to track this stuff down.

I’d wager that the issue stems from the zwave issues. You shouldn’t have to ping anything on a zwave network.

For what it’s worth, I built a template sensor that is monitoring my number of automations running cuncurrently, zero issues. Running zwave JS in the current version with 70 nodes. So it’s apparent (in my opinion) that this stems from your zwave issues.

Except I mentioned above having removed all Z-Wave entities from the groups, so no Z-Wave devices are called/touched during the automation, yet it can still report 5+ seconds and strange behavior when changing the order of the group off actions.

I don’t doubt there’s a possibility of multiple (compounded) issues especially in what was seen originally, but Z-Wave can safely be eliminated in the last tests I performed.

well, the only pure way to tell if it’s “automations” would be to remove all hardware from the equation, like using input booleans. Something that doesn’t require a callback that lets the system know the service went through. Or template switches & lights.

I opened another issue:

To track my own experiences. I can swap to any version of 2023.7 or 2023.8, and my automations get stuck within minutes of finishing the version switch. Reverting back to 2023.6.*, and everything is working fine.

Seems to rule out hardware and Z-wave devices itself.

1 Like

In addition to the log files you attached, add a trace file so the development team can inspect all of the details of the automation’s execution, right up to the action that causes it to wait indefinitely (the trace file contains more information than what’s shown in the node diagram).

1 Like

I just went to reproduce the issue so that I could create an issue, but I am not able to get it to hang. I have the script with the wrong ‘device_id’ but it skips right over that step now. I no longer understand why the script was hanging. If I can manage to reproduce the problem I will report it.

+1 for me. It seems things work if I reboot prior to my 10PM automation running.

If it fails, I can open up the automation and run each step, one at a time, and they all work, so I doubt the automation is bad, especially since it was running find under 2023.7.