Time based (vs event based) Statistics / Filter

DudeShemesh · April 13, 2018, 10:59am

Hello,

The statistics filter (and probably filter sensor too) does not take into account the time spent in a state, but rather treat every discrete state with the same weight. This works great for sensors that change at a somewhat constant rate (e.g. weather), but less so for ones that generate sporadic events (like battery operated hardware sensors). For example, a sensor that was at 255 for a second, but at 0 for hours, would have 127.5 as the mean.

Is there a way to do time based stats? Alternatively, maybe a sensor that polls the state of another sensor, (effectively turning it into a sensor with a constant rate) would do the trick?

robmarkcole · April 13, 2018, 12:14pm

Can you clarify what you want to achieve? I have a sensor to detect when there has been no motion in the house for 5 minutes. I implemented that using an input_select and an automation:

- id: '1513346519354'
  alias: House_idle
  trigger:
  - entity_id: binary_sensor.motion_at_home
    for:
      minutes: 5
    from: 'on'
    platform: state
    to: 'off'
  action:
  - data:
      entity_id: input_boolean.house_idle
    service: input_boolean.turn_on

Alternatively if you can create an SQL query to get the info you need try the SQL sensor?

DudeShemesh · April 13, 2018, 12:33pm

The use case is getting the mean over time from a sensor.
To be honest, I think that most use cases with Statistics(/Filter) would benefit from being weighted by time, rather than samples.

Mariusthvdb · April 13, 2018, 12:41pm

HI,

Working to setup my motion detectors, im struggling to understand why i would do that with an input_boolean and an automation, rather than a template sensor.

Unless I would (need to ) manually set the boolean, like the guest mode in below automation, the sensor takes no interaction, needs no automation, and is always correct… ?

in the case at hand:

- alias: Nobody home
  id: '1511601478144'
  initial_state: 'on'
  trigger:
    - platform: state
      entity_id: group.family
      from: 'not_home'
      to: 'home'
    - platform: state
      entity_id: group.family
      from: 'home'
      to: 'not_home'
  # condition:
  #   - condition: state
  #     entity_id: input_boolean.guest_mode
  #     state: 'off'
  action:
    - service_template: >-
        {% if trigger.to_state.state == 'home' %}
          script.arrive_home
        {% elif trigger.from_state.state == 'home' and 
           states.device_tracker.telefoon == 'away' and 
           states.device_tracker.iphone == 'away' %}
          script.uit_huis_direct
        {% endif %}

where would this time based trigger be best placed?

RobDYI · April 13, 2018, 1:03pm

Instead of trying to change the statistical sensor, I changed the data from event based to time based. I did this by adding a small time (minute or second) data to the number which won’t change the useful data but refresh it ever minute or second. I think the force update option for a mqtt sensor does a similar thing.

DudeShemesh · April 13, 2018, 1:24pm

Nice. This is similar to what I was aiming for in the last sentence of the post (about adding a “polling” sensor that sits on top of others).
How is sensor.time defined? Is it the time_date platform?

I still think it might be worth changing the behavior of Statistics (or at least adding an option), as I’m struggling to find a use case for the way it currently works. Time weighted statistics seem more logical, but maybe it’s just me.

DudeShemesh · April 15, 2018, 2:39pm

@RobDYI can you share the definition of sensor.time?

thanks!

RobDYI · April 16, 2018, 2:09pm

Sure, its right here.

sensor:
  - platform: time_date
    display_options:
      - 'time'

dgomes · April 16, 2018, 2:44pm

I’ve recently added Time based SMA:

DudeShemesh · April 16, 2018, 3:03pm

Thanks. Looks great, but if I understand correctly, it will only update on state changes, so for example, if you have a sensor that is On for a time period, and than off for days, the SMA would not update, and thus not reflect the time weights correctly. Am I right?

dgomes · April 16, 2018, 3:15pm

It will reflect everything correctly, the caveat is that it will only calculate when there is a new value.

You can force new values on the original sensor.

DudeShemesh · April 16, 2018, 3:52pm

If I have a way of forcing updates on the sensor I can just do so in regular intervals and use a simple mean
How would you suggest forcing an update?

dgomes · April 17, 2018, 10:06am

Forcing the update depends on the sensor platform … it’s independent from the sensor.filter

DudeShemesh · April 17, 2018, 10:33am

Yes, but that brings us back to square 1. I’m looking for a way to do this generically.

Right now I’m actually thinking of adding a polling_interval option to the Statistics sensor, which will disable state listening, and instead work in a regular interval. It’s less clean from a design perspective (because it’s only generic on the source side, but not on the destination side - similar behavior would need to be added to Filter and others), but it avoids having an extra platform, extra devices, and extra fake state updates.

dgomes · April 17, 2018, 12:45pm

It makes no sense to add a polling interval, when you know that the state your are polling hasn’t changed meanwhile…

What you really want is an update_interval to the 2nd order sensor (statistics or filter) that will recalculate the statistic value but will not poll anything from the original sensor since this has not changed.

In this sense something similar to the mqtt sensor can be used yes.

DudeShemesh · April 17, 2018, 1:14pm

I guess update_interval could be a better name than polling_interval. In terms of behavior, I think we’re describing the same thing.

DudeShemesh · April 28, 2018, 10:15am

Wrote a version of Statistics that is time weighted. It uses the built-in scan_interval parameter to determine the frequency of sampling the source sensor. Going to take me a while to write the tests and docs to get this production ready (and maybe it should also be merged with the original Statistics sensor), but in the meantime, here’s the code in case anyone wants to pick it up:

"""
Support for statistics for sensor values.

For more details about this platform, please refer to the documentation at
https://home-assistant.io/components/sensor.statistics/
"""
import logging
import statistics
from datetime import timedelta
from collections import deque

import voluptuous as vol

import homeassistant.helpers.config_validation as cv
from homeassistant.components.sensor import PLATFORM_SCHEMA
from homeassistant.const import (
    CONF_NAME, CONF_ENTITY_ID, STATE_UNKNOWN, ATTR_UNIT_OF_MEASUREMENT)
from homeassistant.helpers.entity import Entity

_LOGGER = logging.getLogger(__name__)

ATTR_AVERAGE_CHANGE = 'average_change'
ATTR_CHANGE = 'change'
ATTR_COUNT = 'count'
ATTR_MAX_VALUE = 'max_value'
ATTR_MIN_VALUE = 'min_value'
ATTR_MEAN = 'mean'
ATTR_MEDIAN = 'median'
ATTR_VARIANCE = 'variance'
ATTR_STANDARD_DEVIATION = 'standard_deviation'
ATTR_SAMPLING_SIZE = 'sampling_size'
ATTR_TOTAL = 'total'
ATTR_SOURCE = 'source'

CONF_SAMPLING_SIZE = 'sampling_size'

DEFAULT_NAME = 'MyStats'
DEFAULT_SIZE = 20

SCAN_INTERVAL = timedelta(seconds=60)

ICON = 'mdi:calculator'

PLATFORM_SCHEMA = PLATFORM_SCHEMA.extend({
    vol.Required(CONF_ENTITY_ID): cv.entity_id,
    vol.Optional(CONF_NAME, default=DEFAULT_NAME): cv.string,
    vol.Optional(CONF_SAMPLING_SIZE, default=DEFAULT_SIZE):
        vol.All(vol.Coerce(int), vol.Range(min=1))
})

# pylint: disable=unused-argument
async def async_setup_platform(hass, config, async_add_devices, discovery_info=None):
    entity_id = config.get(CONF_ENTITY_ID)
    name = config.get(CONF_NAME)
    sampling_size = config.get(CONF_SAMPLING_SIZE)

    async_add_devices(
        [MyStatisticsSensor(hass, entity_id, name, sampling_size)],
        True)


class MyStatisticsSensor(Entity):

    def __init__(self, hass, entity_id, name, sampling_size):
        self._hass = hass
        self._entity_id = entity_id
        self.is_binary = True if self._entity_id.split('.')[0] == \
            'binary_sensor' else False
        if not self.is_binary:
            self._name = '{} {}'.format(name, ATTR_MEAN)
        else:
            self._name = '{} {}'.format(name, ATTR_COUNT)
        self._sampling_size = sampling_size
        self._unit_of_measurement = None
        self.states = deque(maxlen=self._sampling_size)

        self.median = self.mean = self.variance = self.stdev = 0
        self.min = self.max = self.total = self.count = 0
        self.average_change = self.change = 0
        

    def _add_state_to_queue(self, new_state):
        try:
            self.states.append(float(new_state.state))
            self.count = self.count + 1
        except ValueError:
            self.count = self.count + 1

    @property
    def name(self):
        return self._name

    @property
    def state(self):
        return self.mean if not self.is_binary else self.count

    @property
    def unit_of_measurement(self):
        return self._unit_of_measurement if not self.is_binary else None

    @property
    def device_state_attributes(self):
        if not self.is_binary:
            state = {
                ATTR_MEAN: self.mean,
                ATTR_COUNT: self.count,
                ATTR_MAX_VALUE: self.max,
                ATTR_MEDIAN: self.median,
                ATTR_MIN_VALUE: self.min,
                ATTR_SAMPLING_SIZE: self._sampling_size,
                ATTR_STANDARD_DEVIATION: self.stdev,
                ATTR_TOTAL: self.total,
                ATTR_VARIANCE: self.variance,
                ATTR_CHANGE: self.change,
                ATTR_AVERAGE_CHANGE: self.average_change,
                ATTR_SOURCE: self._entity_id,
            }
            return state

    @property
    def icon(self):
        return ICON

    async def async_update(self):
        new_state = self._hass.states.get(self._entity_id)
        if new_state is not None:
            self._unit_of_measurement = new_state.attributes.get(
                ATTR_UNIT_OF_MEASUREMENT)

            self._add_state_to_queue(new_state)

        if not self.is_binary:
            try:  # require only one data point
                self.mean = round(statistics.mean(self.states), 2)
                self.median = round(statistics.median(self.states), 2)
            except statistics.StatisticsError as err:
                _LOGGER.error(err)
                self.mean = self.median = STATE_UNKNOWN

            try:  # require at least two data points
                self.stdev = round(statistics.stdev(self.states), 2)
                self.variance = round(statistics.variance(self.states), 2)
            except statistics.StatisticsError as err:
                _LOGGER.error(err)
                self.stdev = self.variance = STATE_UNKNOWN

            if self.states:
                self.count = len(self.states)
                self.total = round(sum(self.states), 2)
                self.min = min(self.states)
                self.max = max(self.states)
                self.change = self.states[-1] - self.states[0]
                self.average_change = self.change
                if len(self.states) > 1:
                    self.average_change /= len(self.states) - 1
            else:
                self.min = self.max = self.total = STATE_UNKNOWN
                self.average_change = self.change = STATE_UNKNOWN

Aap · November 21, 2018, 6:34am

thanks Dude,
works nice, for test I inserted a counter and YES it counts minutes, GREAT, and indeed as said before statistics over event changes isn’t worth much.
I don’t know much about the internals of home assistant, so i try to figure out what the essentials are that you changed, is it correctly that these are

removing of max_age (but that seems to be a property of the class)
adding of SCAN_INTERVAL

I want to use your sensor for the amount of energy/water used. So I want to measure the following parameters:

the amount used this day
the amount used this week (divided by 7, to get amount/day)
the amount used this year ( divided by the number of days)
Has anyone a clue how to get the sum of a entity over some fixed period /
thanks, Stef

DudeShemesh · November 21, 2018, 7:04am

Those are indeed the changes to configuration, but the main change is that the update is now triggered based on the scan interval, rather than on events. It also simplifies some things.

If you change the sampling size to hold a day/week/year you’ll get what you’re after (in the total attribute). However, keep in mind you’ll have to add persistence, as I’m assuming you’re not going to have zero downtime over a year.