Device Monitoring

A smart house can have dozens of devices (we have ~80 of them). The need for a reliable monitoring system is even higher for the Wi-Fi devices, as they might get disconnected due to weak Wi-Fi signal.
We use an active ping-based monitoring with notifications when a device gets disconnected.

Ping Sensors

We use static DHCP leases for all devices, so the IP address is fixed. In order to generate the list of ping sensors with the relevant IP address we use the following template (which covers the Shelly and Google devices):

{% for device in
  states |
  selectattr('entity_id', 'in', integration_entities('shelly')) |
  map(attribute='entity_id') |
  map('device_id') |
  map('string') |
  select("ne", "None") |
  unique
-%}
{% if device_attr(device, 'model') not in ['Shelly Flood', 'Shelly Motion', 'Shelly H&T', 'Shelly Button1'] -%}
- platform: ping
  name: "{{ device_attr(device, 'name') | title }} Ping"
  host: {{ device_attr(device, 'configuration_url').split('/')[-1] }}
{% endif -%}
{% endfor -%}
{% for sensor in
   states.sensor |
   selectattr('entity_id', 'in', integration_entities('Google Home')) |
   selectattr('entity_id', 'match', '.*\_device') |
   map(attribute='entity_id')
-%}
- platform: ping
  name: "{{ state_attr(sensor, 'friendly_name') | title }} Ping"
  host: {{ states(sensor) }}
{% endfor %}

The output of this template was copied into ping_sensors.yaml. Here is the beginning of this file:

- platform: ping
  name: "Dining Room Lights Ping"
  host: 192.168.1.168
- platform: ping
  name: "Guest Room Lights Ping"
  host: 192.168.1.233
- platform: ping
  name: "Attic Venta Ping"
  host: 192.168.1.93

The following line was added to configuration.yaml for creating the ping sensors:
binary_sensor: !include ping_sensors.yaml

Note: It’s not required to use a template for generating the list of sensors. It can be created manually. In our case the list contains ~80 entries, so it’s error prone to do it manually.

Aggregated State

The next step was to create a combined state showing only the disconnected sensors (if there are such). This is done using the following binary sensor template in the configuration.yaml:

template:
  - binary_sensor:
      - name: "Disconnected Devices"
        unique_id: disconnected_devices
        state: >-
          {{
            states.binary_sensor |
            rejectattr('attributes.device_class', 'undefined') |
            selectattr('attributes.device_class', 'eq', 'connectivity') |
            selectattr('entity_id', 'match', '.*_ping') |
            selectattr('state', 'eq', 'off') |
            map(attribute='entity_id') |
            list |
            length > 0
          }}
        attributes:
          long_term: >-
            {{
              states.binary_sensor |
              rejectattr('attributes.device_class', 'undefined') |
              selectattr('attributes.device_class', 'eq', 'connectivity') |
              selectattr('entity_id', 'match', '.*_ping') |
              selectattr('state', 'eq', 'off') |
              selectattr('last_changed', 'lt', now() - timedelta(minutes = 5)) |
              map(attribute='entity_id') |
              map('replace', 'binary_sensor.', '') |
              map('replace', '_ping', '') |
              map('replace', '_', ' ') |
              map('title') |
              sort |
              list
            }}
          ping_sensors: >-
            {{
              states.binary_sensor |
              rejectattr('attributes.device_class', 'undefined') |
              selectattr('attributes.device_class', 'eq', 'connectivity') |
              selectattr('entity_id', 'match', '.*_ping') |
              selectattr('state', 'eq', 'off') |
              map(attribute='entity_id') |
              sort |
              list
            }}

The filter selectattr('last_changed', 'lt', now() - timedelta(minutes = 5)) in the long_term attribute provides a grace period for temporarily disconnections. It’s used with the following automation rule to make sure a device really got disconnected from the network (after multiple ping retries):

- id: device_monitoring_high_frequency
  alias: device_monitoring_high_frequency
  trigger:
    - platform: state
      entity_id: binary_sensor.disconnected_devices
      to: "on"
  action:
    - repeat:
        while:
          - condition: state
            entity_id: binary_sensor.disconnected_devices
            state: "on"
          - condition: template
            value_template: "{{ repeat.index <= 30 }}"
        sequence:
          - delay: 10
          - variables:
              ping_entities: >-
                {{
                  state_attr('binary_sensor.disconnected_devices', 'ping_sensors') |
                  join(',') 
                }}
          - if: "{{ ping_entities | length > 0 }}"
            then:
              - service: homeassistant.update_entity
                data:
                  entity_id: "{{ ping_entities }}"

Notifications

The last part of the solution is notification. We also put the information in a log file for long term analysis.
The configuration.yaml contains the following:

notify:
  - name: gmail
    platform: smtp
    server: "smtp.gmail.com"
    port: 587
    timeout: 15
    sender: "[email protected]"
    encryption: starttls
    username: "[email protected]"
    password: !secret gmail
    recipient:
      - "[email protected]"
  - name: disconnected_log
    platform: file
    filename: /config/ping/log.txt

(The full email address was masked.)

And automations.yaml contains the following rule:

- id: device_monitoring
  alias: device_monitoring
  mode: queued
  trigger:
    - platform: state
      entity_id: binary_sensor.disconnected_devices
      attribute: long_term
  action:
    - variables:
        current_offline: "{{ state_attr('binary_sensor.disconnected_devices', 'long_term') | list }}"
        previous_offline: "{{ trigger.from_state.attributes.long_term | list }}"
        disconnected: "{{ current_offline | reject('in', previous_offline) | list }}"
        reconnected: "{{ previous_offline | reject('in', current_offline) | list }}"
        disconnected_old: "{{ current_offline | select('in', previous_offline) | list }}"
        message: |-
          {% if disconnected | length > 0 -%}
          Disconnected:
            - {{ disconnected | join('\n  - ') }}
          {% endif -%}
          {% if reconnected | length > 0 -%}
          Re-connected:
            - {{ reconnected | join('\n  - ') }}
          {% endif -%}
          {% if disconnected_old | length > 0 -%}
          Disconnected (old):
            - {{ disconnected_old | join('\n  - ') }}
          {% endif -%}
          {% set timestamp = now().isoformat().split('.')[0].replace('T', ' ') -%}
          [{{ timestamp }}]
    - service: notify.gmail
      data:
        title: "Home Assistant Device Monitoring"
        message: "{{ message }}"
    - service: notify.disconnected_log
      data:
        message: |
          {{ message }}
          --------------------------------------------------------------------------------

This solution is running for multiple months in our house, and was able to detect Wi-Fi coverage issues. The current version doesn’t have false positives (the initial one surely had such), and we are happy with the confidence it provides for the health of the devices in our smart home.

6 Likes