Calculation of statistics seems wrong (to me)

puterboy · August 20, 2024, 8:50pm

Looking at the statistics table, it seems like the bucket used for calculating the min/max/mean includes not only all the data points in the relevant 5 minute or 1 hour time period but also the last data point before the start of the bucket – even it occurs several buckets earlier.

I would have thought that the logic should be something like:

If there is at least one data point in the time bucket (starting with the start point and up to but not including the end point), calculate the min, max, and time-weighted average of all data points in the bucket
If there are no data points in the time bucket, use the min/max/mean from the previous bucket.

However, it seems like HA is doing the min/max/mean over all data points in the bucket plus the last data point before the start of the bucket – even if that data points occurs several buckets earlier. Perhaps that makes sense for calculating the mean if you want to smooth the mean, but not for the min/max.

Can someone explain the logic here?

puterboy · August 20, 2024, 11:43pm

I am also trying to understand how the mean is calculated for any given bucket.
I tried several different plausible methodologies and none matched exactly across all samples I fed it.

Indeed the existing mean algorithm seems to over-weight prior data before the start of the window. What seem to be the most unbiased mean estimators are a bit off from what HA recorder calculates.

Does anyone know how the calculation is done and/or where I can see it documented?