Purge statistics max_age based on measurement range

I have a Zigbee humidity sensor. This sensor doesn’t send any updates when the humidity doesn’t change (much).

Currently, when setting a max_age for the statistics sensor, all measurements are purged based on the time of the measurement itself. But in this case, this measurement is valid until the next measurement comes in.

For example, I set max_age to 5 minutes, with the following measurements:

  1. 20% at 15:00
  2. 21% at 15:05
  3. 40% at 15:20

At 15:23, the statistics sensor purges records 1 and 2. But record 2 is still valid: the humidity hasn’t changed since 15:05, so it’s safe to assume the humidity was still 21% just before 15:20.

It would be awesome if an option could be added to the statistics sensor, allowing us to prevent purging measurements that are outside the max_age range, but are still valid (e.g. keep a single measurement that’s outside the max_age range).

You should probably use this integration instead:

@tom_l that’s hardly the correct solution. If the statistics sensor has a bug or can be improved, we need to discuss it.


Hey @LouisMT,
I’ve recently worked on a few bigger changes to the statistics integration but didn’t touch the purging algorithm. Could you please open an issue here: Issues · home-assistant/core · GitHub
I promise to have a look.

Your request is a bit complicated to grasp. The statistics component exists to extract statistics for multiple measurements. Instead of hacking some conditional purge-but-dont-purge strategy, maybe you should think about your processing strategy? Seems to me like you would rather increase the measurement rate of your sensor or the max_age setting…

It’s not purging the values. It only selects states between the provided times, it does not select states prior to the provided time. That’s how it works, changing that behavior will break everyone who doesn’t want that (i.e. people with sensors that update on a set period). You should look into updating your sensor on a set period by wrapping it in a template sensor or forcing the updates on the device. If the device is battery powered and you don’t want frequent updates, your only recourse is wrapping it in a template sensor. Use now() to force an update every minute. Otherwise you can use a time trigger to update every 30 seconds if you want.

Edit: Had to read your description again to understand the use case you have (a use case is often easier to grasp). Now I get your point! If I am correct, you would want to use this in combination with e.g. average_linear from here: Statistics - Home Assistant

I think that makes a lot of sense and is imho not even something we would need to implement conditionally.

@petro As I said before, he might want to adapt his setup. However, I think this is certainly relevant and should can be implemented to improve time-bound characteristics like average_linear.

1 Like

It’s a very simple solution to Louis’s issue, so yes, actually it is the correct solution.

Please don’t tag me again.

What would you end up doing in the backend for this? Take non-periodic results and force a known period?

EDIT: Maybe you can assume the period based on the configured max samples within the timerange. Or add a min_samples and add in the missing information interpolated from the surrounding states.

Hey everyone, and thanks for your replies.

Unfortunately the ha-average integration doesn’t work for what I want to do. Simply put: I want to know if the humidity is increasing significantly in a 15 minute timespan, or if it is steady (e.g. max 2% difference) for 15 minutes.

So currently I’m using a statistics sensor in distance_absolute mode, with a max age of 15 minutes. However, this doesn’t work, as my sensor doesn’t send updates that often. To me it seems strange that the latest measurement (albeit 20 minutes ago) isn’t included in the sensor.

My idea was to add a max_age_characteristic setting, which can be used to change this behavior. But I think this would add too much complexity in the code, and may confuse users. I think petro’s solution is perfect, so thank you very much for that! :slight_smile:

Well ThomDietrich is the guy who made all the recent changes to statistics. If he’s willing to make this FR work with the current system and support all the previous functionality, then we should come up with a solution that doesn’t require the work-around I was proposing.

Hey @LouisMT,
I have a very similar use case to turn off ventilation in the bathroom when humidity settles. I use change rather than distance_absolute but that shouldn’t make a difference (in edge cases your choice might in fact perform better).
With the statistics component its important to ensure that enough samples are generated within the boundaries you define. Now, I understand your sensors behavior and the use case. I did discuss considering the latest sample outside the defined boundaries with another user before but it’s quite difficult to define the right solution. Remember that the statistics component is pretty clearly used to compute statistics for measurements “within a given time frame”. Suddenly implementing logic that considers values outside the user-defined time frame feels counter-intuitive and must eventually spark confusion.

Coming back to the issue. I don’t see anything wrong with petros solution. I would not call it a workaround. You are making an educated assumptions about the behavior of your sensor and model an enriched version through a template sensor. Alternatively I would aim to replace the sensor used (if that increases accuracy, which I understand it wouldn’t).

One more thought: Your timespan is 15min and because there was no update the statistics sensor generates an absolute_distance of “unknown”. Based on what you said it sounds like that is clear indication that the humidity didn’t change, hence you can act on that!?

Best!