Breaking change / race condition in script handling?

joshlawless · April 12, 2024, 7:02pm

So I noticed that my LED dimming automations stopped working recently, although I can’t say for sure which month’s release might be the culprit, and I hope someone here can help me understand why.

I have zwave dimmer switches that control various LEDs (e.g., under counter lighting, behind mirror lighting, etc.). The Zwave switch doesn’t directly control the load, it’s just providing Zwave central scene events that act as automation triggers. The actual LED control is handled by a wireless DMX decoder & multichannel LED driver. To replicate a “hold to dim” or “hold to brighten” effect when interacting with the zwave switch, I long ago set up and have enjoyed success with the following set of automations and scripts.

Automation “Dim Start” watches for a central scene event for KeyHeldDown. In response, the automation calls a script “dim”

The “dim” script does three things: (i) it reads the current brightness attribute, and if it’s above a minimum threshold, (ii) reduces it by a fixed step amount, and then (iii) calls the “dim pause” script.

The “dim pause” script does two things: (i) delay 50ms and (ii) call the “dim script”

To prevent this cycle from continuing forever, there is another automation “Dim Stop” that watches for a central scene event of KeyReleased, and in response sends “off” commands to both “dim” and “dim pause” scripts.

This worked wonderfully well for a long time. Recently, I’ve noticed that holding down the switch doesn’t work anymore, and upon investigation, see that the log includes the error “dim_pause: Already running”

I can’t see how that would be - the “dim_pause” script delays 50ms, and then calls “dim,” and should at that point no longer be busy, so that when “dim” calls “dim_pause” again, the cycle repeats.

Below is the relevant code:

Automation to start dimming:

alias: Counters dim start
description: Dims the counters LED strip based on switch held down
trigger:
  - platform: event
    event_type: zwave_js_value_notification
    event_data:
      node_id: 122
      property_key: "002"
      value: KeyHeldDown
condition: []
action:
  - service: script.turn_on
    data:
      entity_id: script.counters_dim
mode: single

Script to actually dim:

sequence:
  - condition: "{{ (state_attr('light.kitchen_counters', 'brightness')|int) > 10 }}"
  - service: light.turn_on
    data_template:
      entity_id: light.kitchen_counters
      brightness: >-
        {% set current =
        states.light.kitchen_counters.attributes.brightness|default(0)|int %} {%
        set step = 5 %} {% set next = current - step %} {% if next < 10 %}
          {% set next = 10 %}
        {% endif %} {{ next }}
  - service: script.turn_on
    data:
      entity_id: script.counters_dim_pause

Script to pause before dimming again:

sequence:
  - delay:
      milliseconds: 50
  - service: script.turn_on
    data:
      entity_id: script.counters_dim

Automation to stop dimming when keypad on switch released:

alias: Counters dim stop
description: Stops dimming the counters LED strip based on down switch released
trigger:
  - platform: event
    event_type: zwave_js_value_notification
    event_data:
      node_id: 122
      property_key: "002"
      value: KeyReleased
condition: []
action:
  - service: script.turn_off
    data:
      entity_id: script.counters_dim
  - service: script.turn_off
    data:
      entity_id: script.counters_dim_pause
mode: single

joshlawless · April 12, 2024, 7:15pm

Adding a “delay” step of 10ms to the “dim” script restored functionality, but leads me to believe that the problem is some kind of race condition.

Script “dim pause” has, as its last step, calling script “dim” – once this step is performed, the state of script “dim pause” should be “off”

Script dim, as its last of three steps, calls script “dim pause” – if script “dim” is executed fast enough (before the state of script “dim pause” has updated to “off”), then it will fail thinking script “dim pause” is still busy.

This didn’t use to be the case, though.

jazzyisj · April 15, 2024, 12:55pm

Just to confirm you’re not crazy, I had at least 1 automation that stopped acting as expected after the last update (2024.4). I think something has changed in the timing of state changes. Where I could always depend on the order of state changes in my triggers before, now they all seem to trigger at once.

In my case I was able to split up the triggers and add a for: statement to the ones I wanted to check last to restore previous functionality.

    - platform: state
      id: armed
      entity_id: alarm_control_panel.master
      to:
        - armed_home
        - armed_night
        - armed_away
        - armed_vacation

    - platform: state
      id: armed
      entity_id:
        - alarm_control_panel.house
        - alarm_control_panel.garage
      to:
        - armed_home
        - armed_night
        - armed_away
        - armed_vacation
      for: 1 # force master trigger first if master changed

I’m going to have to go back through my automations to see if there is anywhere else this could potentially be happening.

123 · April 15, 2024, 2:00pm

I noticed someone else had reported unexpected behavior: a timer’s finished event was posted before the timer’s state had actually changed to idle (by a few milliseconds). This caused a problem in their automation (i.e. timer is finished but not immediately idle).

I don’t know what changed under the hood to cause this new (breaking) behavior but (my guess) it seems to have been a fundamental modification (to improve performance?) that may have other ramifications (possibly including the ones mentioned in this topic).

This particular timer-related anomaly was reported in the Core repository and fixed in 2024.4.1.

Whatever the root cause, it appears to be breaking certain existing automations so I suggest reporting it as an Issue in the Core repository.