Handling stale/broken sensors that drive thermostats

Hello,

I have several home-made battery powered temperature sensors driving various thermostats.

There have been a couple of occasions where a sensor will stop sending updates (become “stale”) for different reasons:

  1. Connectivity issue
  2. Power issue
  3. User issue

This causes knock-on issues with thermostats driving heaters, which will either not heat or heat without stopping leading to too cold/too hot.

I have setup an automation per sensor to switch off disable any associated thermostats while the issue is fixed, and notify me. I have not yet setup automations to re-enable disabled thermostats once data is received again. This feels quite manual/laborious so my first question is:

1. Can you programmatically create/define automations using arrays of, in my case, sensors and related thermostats? Ideally I would have a list of sensors that I care about, and what to do if they go stale/go back online.
2. If this is not possible, can you include loops inside automations?

Secondly, I’d like to update the display of the sensors to show “NaN” or similar, to flag to me visually when I have stale sensors (sometimes I don’t fix them immediately and forget).

I’ve tried this example I found in the forum, to change a sensor reading if older than 4 hours:

- platform: mqtt
  state_topic: "kitchen/temperature"
  name: "Kitchen Temperature"
  unit_of_measurement: "°C"
  value_template: >
    {% if (as_timestamp(now()) - as_timestamp(states.sensor.kitchen_temperature.last_changed))/(60*60) > 4 %}
      {{ 'NaN'  }}
    {% else %}
      {{states('sensor.kitchen_temperature')}}
    {% endif %}

But of course, NaN leads to:

Screenshot 2021-02-28 at 20.35.24

as, it’s not a number…

Is anyone else dealing with stale sensors and have a solution?

If I use -1 it could switch on thermostats which haven’t been disabled by the above, and given how bad it is when a heater is fully switched on and overheating a living area, I’d like the security of having both the automation and visual cue.

Any tips welcome!

Amadeus

It seems this is not a problem many are faced with.

Given the number of sensors and automations/thermostats I run I’ve written a small notebook to generate YAML config for me.

(repo is at https://github.com/amadeuspzs/ha-config).

I will be adding more complexity to it over the coming days, but testing the first batch of auto-generated config.

Okay so this has been more complicated than I envisaged due to the fact the HA resets the last_changed value of sensors when restarting (when e.g. testing new config) so as per Real state last_changed and a reimplementation for SQLite, a bunch of new “proxy” sensors (rlc_) which store real time since last change as an hourly value, I have finished the script which will:

  1. Take either automation or climate as setting to check/switch off
  2. Work with either single or multiple values for automation/climate
  3. Have a quick yaml parse to check for errors

It is all in the notebook linked above, but for those that prefer reading in the forum here is the code that I have tested and working well:

# escape any ' with ''
settings = [
    {"id": "attic_office_stale", 
     "alias": "Attic Office", 
     "sensor": "attic_office_temperature",
     "rlc_sensor": "rlc_attic_office_temperature",
     "timeout": 4, 
     "automations": ["attic_heating_on","attic_heating_on_f"]},
    {"id": "pip_room_stale", 
     "alias": "Pip''s room", 
     "sensor": "pip_temperature", 
     "rlc_sensor": "rlc_pip_temperature",
     "timeout": 4, 
     "automations": ["turn_pip_s_heating_on"]},
    {"id": "kitchen_stale",
     "alias": "Kitchen", 
     "sensor": "kitchen_temperature", 
     "rlc_sensor": "rlc_kitchen_temperature",
     "timeout": 4, 
     "automations": ["kitchen_heating_on"]},
    {"id": "tackroom_stale", 
     "alias": "Tackroom", 
     "sensor": "tackroom_temperature",
     "rlc_sensor": "rlc_tackroom_temperature",
     "timeout": 4, 
     "climates": ["tack_room"]},
    {"id": "greenhouse_stale",
     "alias": "Greenhouse", 
     "sensor": "greenhouse_temperature",
     "rlc_sensor": "rlc_greenhouse_temperature",
     "timeout": 4, 
     "climates": ["greenhouse"]},
    {"id": "hot_water_stale", 
     "alias": "Hot Water", 
     "sensor": "hot_water",
     "rlc_sensor": "rlc_hot_water",
     "timeout": 4, 
     "automations": ["hot_water_bath","hot_water"]}
]

config=""
for item in settings:
    if "automations" in item:
        is_automation = True
        is_climate = False
    elif "climates" in item:
        is_automation = False
        is_climate = True        
    else:
        raise Exception(f"No automations or climates defined for {item['id']}")
        
    config += f"""
- id: '{item["id"]}'
  alias: 'Stale sensor: {item["alias"]}'
  description: ''
  trigger:
    - platform: time_pattern
      minutes: 7
  condition:
  - condition: and
    conditions:
    - condition: numeric_state
      entity_id: sensor.{item["rlc_sensor"]}
      above: '{item["timeout"]}'
"""

    if (is_automation and len(item["automations"]) > 1) or is_climate and len(item["climates"]) >1:
        config += f"""
    - condition: or
      conditions:
"""
        if is_automation:
            for automation in item["automations"]:
                config += f"""
      - condition: state
        entity_id: automation.{automation}
        state: 'on'
"""
        elif is_climate:
            for climate in item["climates"]:
                config+= f"""
      - condition: state
        entity_id: climate.{item["climate"]}
        state: heat
"""
    elif is_automation:
        config+=f"""
    - condition: state
      entity_id: automation.{item["automations"][0]}
      state: 'on'
"""
    elif is_climate:
        config+=f"""
    - condition: state
      entity_id: climate.{item["climates"][0]}
      state: heat        
"""

    config += f"""
  action:
"""
    if is_automation:
        for automation in item["automations"]:
            config += f"""
  - data: {{}}
    entity_id: automation.{automation}
    service: automation.turn_off
"""
    elif is_climate:
        for climate in item["climates"]:
            config += f"""
  - data: {{}}
    entity_id: climate.{automation}
    service: climate.turn_off
"""
            
    config += f"""
  - data:
      message: Disabling {item["alias"]}
      title: 'Stale sensor: {item["sensor"]}'
    service: notify.mobile_app_pixel_2
  mode: single
"""

try:
    yaml.safe_load(config)
    print(config)
except:
    print("YAML failed to parse")

I decided not to automatically re-enable automations/climates and skip the NA sensor view. The notifications to my mobile phone work well enough.

I’ve been struggling with detecting stale Zigbee sensors integrated via Zigbee2MQTT, and came across this post (thanks shbatm!) which avoids the whole last_changed mess:

- sensor:
    - name: Stale - critical
      state: >
        {% for state in expand('group.sensors_critical') -%}
          {%- if state.attributes.last_seen %}
             {%- if (as_timestamp(now()) - state.attributes.last_seen|as_datetime|as_timestamp) > (4 * 3600) %} {{ state.name }} 
                ({{ relative_time(state.attributes.last_seen|as_datetime) }}),{%- endif -%}
          {%- endif -%}
        {%- endfor %}

The as_datetime filter in HA 2021.7 simplifies things a bit, and I’m using groups to target specific sets of sensors.

Thanks @emelarnz. I hadn’t been using groups so that is a big plus to help me refactor this code :slight_smile:

Sadly it looks like last_seen is an attribute specifically for Zigbee devices, and not available for generic MQTT sensors :frowning:

I’ve found a feature request that would help on this: Retain last state change data of a sensor after reboot