Using Statistics to replace Trend - is this really possible?

kenwiens · April 1, 2024, 10:54pm

Hi, I’m hoping someone better experienced in statistics can suggest a solution for me.

I am doing lake level analysis on a body where the level varies significantly (including changes from dropping to rising back to dropping etc several times an hour). I pull data every 10 minutes. I calculate a 24 hour trend, a 48 hour trend and a 7 day trend. I see possibly doing this for 10+ days as well.

The trend sensor loses all values if you restart HA, reload YAML etc. With an average of 3-4 releases of HA each month, the chance of every getting anything over a 7 day trend are virtually zero - and given that data collection starts over with every restart, most of the time the sample size will not be sufficient to provide accurate trend data.

Due to the HA restart problem, I see many users are switching to the statistics integration, but I can’t see anything there will take every data point in the sample, determine the best fit line and give your a trend (effectively the slope of the line). Characteristics like change_second only use the first and last sample - which would be incredibly inaccurate. (think if you had the first sample of 10 meters and the next 500 samples were between 1 and 3 meters so really the lake could be rising but - using only the first and last measurements would say it was dropping significantly )

I thought average_step might do it, but it really isn’t the average of the step sizes (silly me for thinking that from the name ).

Has anyone gotten statistics to give them the trend value of the best fit line through the entire sample set? (like the trend integration does)

Thanks for any ideas, thoughts or even completely different ways of approaching this !

Additional information:

I ran the change sensor for 47, 48 and 49 hours - and it shows just how variable (and how useless to me) the “change” attribute is:

47 hours 6mm rise
48 hours 8mm rise
49 hours 2 mm rise

I was hoping that these might be close enough that I could average them and get a reasonable representation of the line (for the 48 hours period in this example) - but the level varies so widely for each sample that this isn’t possible.

So I think I have proven (to myself anyway) that I really need a trend line in order to get an accurate representation of the lake level change over time.

Just for the curious - here is what the 48 hours stats look like: