Statistics integration: Share your improvement suggestions

ThomDietrich · November 7, 2021, 12:41pm

Hey community,

TL;DR;

Please share your experience with the statistics integration and what you would like to see!

Intro

I have recently added some improvements to the Statistics Integration and intend to add more. The Statistics integration is great. It computes statistical characteristics about a sensor over the last minutes or hours. That enables you to define automations based on higher level insights and trends about your home.

Let me give a few random examples:

“When the humidity in the bathroom rises rapidly, activate the ventilation”
“When the humidity in the bathroom didn’t change much over the last 5min, disable the ventilation”
“When we brewed more than 10 coffee over one day, send a notification to ease on the consumption”
“When the bee hive’s weight doesn’t change much over the course of 12h, despite good weather, send a warning notification” (yes this is a real and super valuable use case)

The Problem

The integration is not very usable - (That is my personal opinion, with years of data science experience and a strong focus on usability.)

Why?

Many of the provided characteristics are core statistics but not directly useful for home automation use cases
The characteristics by name are very ambiguous, one needs to check the source code to understand what they represent
Configuration with “sampling_size” and “max_age” is rather technical and not straight forward

Don’t get me wrong. The integration is already super useful, but I believe we can do better! I want to contribute to that by improving the configuration and the characteristics provided.

Your Feedback Needed

Let me know which use cases you tend to tackle with a statistics sensor, that felt not as straight forward as you would have liked.
What would make usage more intuitive?
Which statistical characteristics are you missing?
Which use cases can not currently be covered?

You can find my first collection of thoughts here, but in this thread I want to focus on your input. Let’s see how it goes.

ThomDietrich · September 2, 2022, 12:31pm

Hey @lospinoj,
welcome to Home Assistant! You’ve deleted your question, so I guess you solved it yourself, great!

However there was an underlying feature request there, definitely worth putting on the list

So far the statistics integration is configured manually, meaning via yaml code in text files. The alternative approach taken by many other integrations is via the integrations UI, however, this one is more reserved for hardware devices etc. A couple of months ago the “Helpers” page was hugely improved: 2022.4: Groups! Groups! Groups! - Home Assistant

The statistics component should be available as a UI configurable helper. I will take note of that! Thanks

Didgeridrew · September 2, 2022, 4:52pm

You have already pointed out one of the biggest stumbling blocks for new users, i.e. sampling_size and max_age. I’ve seen numerous posts here and on other groups where new users did not understand that they need to specify a sample size when using max_age. Many new users don’t know how often their sensors update or even how to estimate it, this is made difficult since so many sensors don’t necessarily push updates at regular intervals to save battery power. I think that the assumption made by new users is that, if a max_age is set and a sampling_size isn’t, then the sample size should be however many it takes to cover the range of max_age.

I’ve also experienced some obviously wrong means. From playing around with SQL sensors I think the issue may be that consecutive repeated/non-unique values aren’t filtered out… but I’m not a dev or data scientist, so take that as the weakly supported hypothesis that it is. I finally gave up on using the statistics integration for quite a few sensors where I wanted a mean value.

The following is a bit more of an FR:

IIRC, the sensor used just show all the characteristics instead of having to define a new sensor for each one. That may have been overkill, but it would be nice to be able to list the characteristics you want… something like:

  - platform: statistics
    name: "Stats AC On 10 Day"
    entity_id: sensor.ac_on
    state_characteristic: 
      - mean
    attributes:
      - median
      - standard_deviation
    max_age:
      hours: 240
    sampling_size: 1000