Strategies for dealing with restarts in automations

I have found myself fighting with restarts of Home Assistant and Template reloads when it comes to my automations. I have quite a lot of template entities and they’re basically the basis of nearly all my automation.

The problem is that template entities don’t actually restore their state, they just run their template as normal and get back in the right state. This in turn causes rapid fire state changes after startup or after the template integration has been reloaded as everything returns to the correct state. But my automations trigger from these state changes which is bad since restore has only partially completed (from my perspective, not HA’s). So they make bad decisions based on incomplete information.

I get that it fixes itself quickly enough but lights and outlets flickering after I make changes is not good for PAF. I can’t imagine I’m the only one battling this so I was wondering what kinds of patterns and strategies are people employing to deal with this?

I haven’t experienced the issue you have described (but I understand how it can happen). Can you post an example of a Template Sensor and an automation triggered by it that demonstrates the issue?

Sure. So the main one causing me issues is my presence lights automation. Here’s a rundown.

I have two key sensors per room, an aggregate person_detected sensor which looks like this:

template:
  - binary_sensor:
      - name: Person detected bedroom
        unique_id: 7585346108f14c3194e94a3396c1d96b_detector_bedroom
        device_class: occupancy
        state: >-
          {% set presence_entities = [
            "binary_sensor.bedroom_motion_sensor_occupancy",
            "binary_sensor.bedroom_tv_in_use",
            "switch.bedroom_closet_light",
          ] %}
          {{ expand(presence_entities) | selectattr('state', 'eq', 'on') | list | count > 0 }}

Basically I built a list per room of all the things that when in use mean the room is occupied.

The second is a brightness sensor which looks like this:

template:
  - binary_sensor:
      - name: Bright bedroom
        unique_id: "cdd8d74ab466499fb0d45fab796bf552"
        device_class: light
        state: >-
          {% set lux = states('sensor.bedroom_illuminance_lux_stats') | float(-1.0) %}
          {% set daytime = is_state('binary_sensor.daytime', 'on') %}
          {% if daytime and lux >= 0 %}
            {% set threshold = state_attr('light.bedroom', 'brightness') | int(0) / 5 + 75 %}
            {{ lux >= threshold }}
          {% else %}
            {{ is_state('binary_sensor.bright_inside', 'on') }}
          {% endif %}

Which essentially looks at the value of the illuminance sensor for that room and turns on when the light is over a certain level (so I know I don’t have to turn on the lights to make it bright). The threshold also attempts to account for the brightness of the light (so they generally don’t turn on and immediately turn off). The bright_inside piece in the else is because the sensors are a little flaky sometimes so essentially what happens is if I haven’t heard from one in a while I ignore it and use the average of the others.

In addition to these two I have an input_select per room (ex. input_select.presence_lights_bedroom) with one of the following values: on, off, or override. A room will only be adjusted by this automation when its input_select has value on.

All this feeds into this automation:

- id: a6ff604da4c843d98102bbcf0240e890_automation_presence_lights
  alias: Set lights based on presence
  description: Turn on and off rooms based on presence, light level and current overrides
  mode: parallel
  trace:
    stored_traces: 15
  variables: !include_dir_named /config/common/room_presence
  trigger:
    - id: input_change
      platform: state
      entity_id: !include_dir_merge_list /config/common/room_presence/inputs
      to:
    - platform: event
      event_type: !include /config/common/reload_events.yaml
  action:
    - variables:
        rooms: "{{ [area_name(trigger.entity_id)] if trigger.id == 'input_change' else rooms }}"
    - alias: Loop through rooms
      repeat:
        count: "{{ rooms | count }}"
        sequence:
          - variables:
              room: "{{ rooms[repeat.index - 1] | slugify }}"
          - variables: &presence-lights-room-variables
              controller: "{{ controllers | select('search', room) | first }}"
              detector : "{{ detectors | select('search', room) | first }}"
              light_sensor: "{{ light_sensors | select('search', room) | first }}"
              off_scene: "{{ off_scenes | select('search', room) | first }}"
              on_script: "{{ on_scripts | select('search', room) | first }}"
          - alias: Stop if room controller isn't on
            condition: "{{ is_state(controller, 'on') }}"
          - choose:
              - alias: Room is occupied
                conditions: "{{ is_state(detector, 'on') }}"
                sequence:
                  - service: "{{ on_script }}"
                    data:
                      skip_lights: "{{ is_state(light_sensor, 'on') }}"
                      brighter_only: true
            default:
              - alias: Room is unoccupied
                service: scene.turn_on
                data:
                  entity_id: "{{ off_scene }}"
                  transition: 2.5

!include_dir_named /config/common/room_presence becomes:

controllers: ['input_select.presence_lights_bedroom', ...]
detectors: ['binary_sensor.person_detected_bedroom', ...]
light_sensors: ['binary_sensor.bright_bedroom', ...]
off_scenes: ['scene.bedroom_off', ...]
on_scripts: ['script.bedroom_on', ...]
rooms: ['Bedroom', 'Kitchen', ...]

!include_dir_merge_list /config/common/room_presence/inputs becomes:

- input_select.presence_lights_bedroom
- binary_sensor.person_detected_bedroom
- binary_sensor.bright_bedroom
- ...repeat those 3 per room

!include /config/common/reload_events.yaml becomes:

- automation_reloaded
- script_reloaded
- scene_reloaded

Note that the idea was to intentionally not rely on trigger so that no matter how the automation ran it always just set all the rooms to the state they should be in. I realize there is one reference to trigger in the first step but its optional. I put it in so that if someone tripped a motion sensor it tries to do that one first to optimize performance a bit. But it won’t break if trigger is missing or whatever.

I considered that one option is probably to rewrite this to be dependent on trigger so one room being restored doesn’t fire the lights in the others. Perhaps that would be safer although I think there would still be a race condition between the brightness sensor and the person detected sensor in each room.

EDIT: I’m going to try tweaking it so if trigger.id == 'input_change' then it only processes that room and not all the others to see how that works. That also means I have to change it to mode: parallel but I don’t think that will be an issue since all runs will be seeing the same sensors and making the same decisions. It will still process all rooms for the other triggers but we’ll see how that impacts startup.

Also changing it because I got annoyed walking into a room and it took half a second to turn the lights on. Turns out two rooms triggered at the same time and the other one stopped mine (mode: restart) so my room had to wait its turn at the end of the list. Can’t have that lol. Adjusted the automation above accordingly.

1 Like

Do you have a simpler example of something that doesn’t work properly on startup? For example, something that doesn’t employ loading YAML via include statements?

Or is it just the ones that use include that are affected?

Honestly its just this one for right now. I’m sort of in the middle of a rewrite to move my automation from Node RED to HA so a lot hasn’t been moved over yet. This is an issue I’ve run into in the past though so I figured I’d ask and see what advice people have for if/when it comes up again.

I don’t think the !include statements are really affecting anything, I can see everything I expect to be there in every trace. I can fold them in if you want. All they are is lists of entities that I use in multiple automations that I didn’t want to copy repeatedly. Here’s the automation with the !include statements folded in:

- id: a6ff604da4c843d98102bbcf0240e890_automation_presence_lights
  alias: Set lights based on presence
  description: Turn on and off rooms based on presence, light level and current overrides
  mode: parallel
  trace:
    stored_traces: 15
  variables:
    off_scenes:
      - scene.bedroom_off
      - scene.family_room_and_dining_room_off
      - scene.guest_room_off
      - scene.kids_room_off
      - scene.kitchen_off
      - scene.upstairs_off
    on_scripts:
      - script.bedroom_on
      - script.family_room_and_dining_room_on
      - script.guest_room_on
      - script.kids_room_on
      - script.kitchen_on
      - script.upstairs_on
    rooms:
      - Bedroom
      - Family Room
      - Guest Room
      - Kids Room
      - Kitchen
      - Upstairs
    controllers:
      - input_select.presence_lights_bedroom
      - input_select.presence_lights_family_room
      - input_select.presence_lights_guest_room
      - input_select.presence_lights_kids_room
      - input_select.presence_lights_kitchen
      - input_select.presence_lights_upstairs
    detectors:
      - binary_sensor.person_detected_bedroom
      - binary_sensor.person_detected_family_room
      - binary_sensor.person_detected_guest_room
      - binary_sensor.person_detected_kids_room
      - binary_sensor.person_detected_kitchen
      - binary_sensor.person_detected_upstairs
    light_sensors:
      - binary_sensor.bright_bedroom
      - binary_sensor.bright_family_room
      - binary_sensor.bright_guest_room
      - binary_sensor.bright_kids_room
      - binary_sensor.bright_kitchen
      - binary_sensor.bright_upstairs
  trigger:
    - id: input_change
      platform: state
      entity_id: 
        - input_select.presence_lights_bedroom
        - input_select.presence_lights_family_room
        - input_select.presence_lights_guest_room
        - input_select.presence_lights_kids_room
        - input_select.presence_lights_kitchen
        - input_select.presence_lights_upstairs
        - binary_sensor.person_detected_bedroom
        - binary_sensor.person_detected_family_room
        - binary_sensor.person_detected_guest_room
        - binary_sensor.person_detected_kids_room
        - binary_sensor.person_detected_kitchen
        - binary_sensor.person_detected_upstairs
        - binary_sensor.bright_bedroom
        - binary_sensor.bright_family_room
        - binary_sensor.bright_guest_room
        - binary_sensor.bright_kids_room
        - binary_sensor.bright_kitchen
        - binary_sensor.bright_upstairs
      to:
    - platform: event
      event_type: 
        - automation_reloaded
        - script_reloaded
        - scene_reloaded
  action:
    - variables:
        rooms: "{{ [area_name(trigger.entity_id)] if trigger.id == 'input_change' else rooms }}"
    - alias: Loop through rooms
      repeat:
        count: "{{ rooms | count }}"
        sequence:
          - variables:
              room: "{{ rooms[repeat.index - 1] | slugify }}"
          - variables:
              controller: "{{ controllers | select('search', room) | first }}"
              detector : "{{ detectors | select('search', room) | first }}"
              light_sensor: "{{ light_sensors | select('search', room) | first }}"
              off_scene: "{{ off_scenes | select('search', room) | first }}"
              on_script: "{{ on_scripts | select('search', room) | first }}"
          - alias: Stop if room controller isn't on
            condition: "{{ is_state(controller, 'on') }}"
          - choose:
              - alias: Room is occupied
                conditions: "{{ is_state(detector, 'on') }}"
                sequence:
                  - service: "{{ on_script }}"
                    data:
                      skip_lights: "{{ is_state(light_sensor, 'on') }}"
                      brighter_only: true
            default:
              - alias: Room is unoccupied
                service: scene.turn_on
                data:
                  entity_id: "{{ off_scene }}"
                  transition: 2.5

What was the reason for triggering the automation on those events?


EDIT

I would test the automation without the event triggers and the include statements. In other words, confirm a bog standard version works properly before decorating it with includes and extra events.

1 Like

The idea was if I adjusted this automation or any of the scenes/scripts it depended on then it would immediately correct itself. So like for example If I decided bedroom_on should now include another light then it would re-run after reload and turn on that light (if someone was in the room).

Pretty rare I know. Helpful during debugging and testing phase, now probably meaningless as they won’t change as much. I haven’t had any issue with them though. Those events aren’t fired at startup, only when I reload those particular integrations. And when that happens all the sensors exist so it just runs through the loop and does any self-correction necessary.

Are you sure? On startup, all domains are loaded.

Yes. There is a different event for when an integration has been loaded during startup - component_loaded:

This event is fired when a new integration has been loaded and initialized.

Please note that while this event is fired for each loaded integration during Home Assistant startup, the automation engine of Home Assistant is started last. Thus this event can not be used to run automations during startup as it would have missed these events.

automation_reloaded, scene_reloaded, and script_reloaded (can’t find doc to link to for this one) are exclusively fired when that component is reloaded from a running instance. I can also confirm that I never see these triggers as the cause in my traces immediately after startup.

1 Like
  trigger:
    - id: input_change
      platform: state
      entity_id: !include_dir_merge_list /config/common/room_presence/inputs
      to:

Forgive me if my question is really dumb, but with your automation trigger set like this, doesn’t it trigger on any state change in the given entities, including when they move from state none, unknown or unavailable, as at start up, or reloading of template sensors. Is that what you want?

Otherwise, you could try to filter out the trigger.from_state.state state change if it is from one of those using a condition. But I might be barking up the wrong tree here.

1 Like

Good point; I normally specify the exact states allowed for triggering, even for non-Template entities.

Screenshot from 2022-03-11 11-00-28

Great point! This is essentially how I solved the problem in Node RED actually since I found after startup the state_changed events had the from_state part set to null. But in HA I’m seeing something different.

So I just tried making it bog standard as @123 suggested and restarting. It did still did the flicker thing unfortunately. I took a look at the trigger that turned it off and this was it:

trigger:
  id: input_change
  idx: '0'
  platform: state
  entity_id: binary_sensor.bright_guest_room
  from_state:
    entity_id: binary_sensor.bright_guest_room
    state: 'off'
    attributes:
      device_class: light
      friendly_name: Bright guest room
    last_changed: '2022-03-11T16:02:03.145719+00:00'
    last_updated: '2022-03-11T16:02:03.145719+00:00'
    context:
      id: 2676f833b573c9eb9e3c0a1829257d00
      parent_id: null
      user_id: null
  to_state:
    entity_id: binary_sensor.bright_guest_room
    state: 'on'
    attributes:
      device_class: light
      friendly_name: Bright guest room
    last_changed: '2022-03-11T16:02:04.199377+00:00'
    last_updated: '2022-03-11T16:02:04.199377+00:00'
    context:
      id: 8594c0e945fcb862372293884505d386
      parent_id: null
      user_id: null
  for: null
  attribute: null
  description: state of binary_sensor.bright_guest_room

This is immediately following a restart and I have no other triggers from this entity prior to this. And yet my trigger says bright guest room is going from off to on.

However I noticed this in the trigger that turned them back on again:

trigger:
  id: input_change
  idx: '0'
  platform: state
  entity_id: binary_sensor.person_detected_guest_room
  from_state:
    entity_id: binary_sensor.person_detected_guest_room
    state: unknown
    attributes:
      device_class: occupancy
      friendly_name: Person detected guest room
    last_changed: '2022-03-11T16:01:58.556697+00:00'
    last_updated: '2022-03-11T16:01:58.556697+00:00'
    context:
      id: 934cb8c15072fd8dfff060c49585e565
      parent_id: null
      user_id: null
  to_state:
    entity_id: binary_sensor.person_detected_guest_room
    state: 'on'
    attributes:
      device_class: occupancy
      friendly_name: Person detected guest room
    last_changed: '2022-03-11T16:02:11.602911+00:00'
    last_updated: '2022-03-11T16:02:11.602911+00:00'
    context:
      id: 6f26f3fd0353c600dab9e8489ea5dfa4
      parent_id: null
      user_id: null
  for: null
  attribute: null
  description: state of binary_sensor.person_detected_guest_room

This one did go from unknown to on.

None of these should ever be unknown or unavailable. availability is not set so barring some strange error the only time any should be unknown is after restart/reload.

Therefore I think the solution here is to insert a condition which stops it from proceeding if any of the entities it depends on have state unknown or unavailable. I think that’s more important then adding from here since that apparently won’t stop my bright_{room} sensors from firing off the automation in a bad state.

EDIT: Actually, I’m realizing this might be more of a flaw in the logic of my bright_{room} sensors. Think I’m going to take a look and see if I can do anything to fix them from having multiple state changes right ater startup like this and instead just go from unknown to on. I’m going to mark yours as the solution since I think it applies more generally.

Thanks everyone!

Is the template for binary_sensor.person_detected_guest_room similar to the one in your first post?

Yes. Just a different list of entities to look at which make sense for that room:

- name: Person detected guest room
  unique_id: a19a70ede69748a8b51c08be52044f6c_detector_guest_room
  device_class: occupancy
  state: >-
    {% set presence_entities = [
      "binary_sensor.guest_room_motion_sensor_occupancy",
      "binary_sensor.guest_room_right_monitor_in_use",
      "binary_sensor.guest_room_left_monitor_in_use",
      "switch.guest_room_closet_light",
    ] %}
    {{ expand(presence_entities) | selectattr('state', 'eq', 'on') | list | count > 0 }}

What kind of integrations produce the four entities in that template?

The first three are MQTT entities created by Zigbee2MQTT. The switch is a lutron caseta light switch.

Something seems amiss here because, at startup, the Template-based entities should be ‘settled business’ before automations are loaded. The key is to determine what prevents this particular Template Sensor from being resolved prior to the automation where it’s referenced. That’s why I’m trying to follow the chain of entities to determine if any are responsible for the Template Binary Sensor’s undesirable result. My thinking here is to nip it at the source rather than use a State Condition to filter it.

No not exactly, at least not to my understanding. Yes template entities are loaded by that point but they don’t run until later. What happens during startup is this:

  1. setup_and_run_hass tells bootstrap to set up hass
  2. This calls async_from_config_dict to set up hass from my config
  3. This sets up the core integrations and then begins loading the rest here.
  4. Template is a stage 2 integration so its loaded here.
  5. When template entities are loaded they don’t immediately render their template unless core is already in a running state. Instead they add a listener for EVENT_HOMEASSISTANT_START here. When that fires they call _async_template_startup which renders their templates and sets them up to listen for changes on the entities in the templates.
  6. After stage 2 has entirely loaded then async_run is called.
  7. This in turn calls async_start which fires EVENT_HOMEASSISTANT_START here and then eventually sets the core state to running.

For reference, automation is also a stage 2 integration. What is in stage 2 is defined here (basically everything that isn’t in LOGGING_INTEGRATIONS, CORE_INTEGRATIONS, DEBUGGER_INTEGRATIONS or STAGE_1_INTEGRATIONS from here). So in fact although template entities have been loaded, their state isn’t set until after automations are loaded and then automations pick up that state change.

So I think the only things that are odd about my situation is:

  1. Why are my bright_{room} sensors starting as off and not unknown?
  2. I am listening for changes on two template sensors per room and therefore getting in a race condition.

Alternatively it would be nice if template entities restored their state as that would eliminate this problem entirely, this first state change wouldn’t actually be a change at all (unless something had actually changed while HA was off). But that’s a different topic.

1 Like

It implies that, unless a State Trigger explicitly indicates the desired state-change, it will always be triggered by the change from unknown to anything else (at startup) because every Template Sensor’s state is unknown until evaluated to produce an actual state.

Yes. Unless its a trigger template entity, those have different rules. When they are added to hass they don’t wait to register their listeners and they also don’t run their template on startup. So they are unknown until their trigger fires, which could be hours or days later or could be immediately, before automation has even been loaded.