Statistics sensor becomes intermittent

ThisWayToo · July 13, 2023, 12:41am

I’m at a loss to explain why the statistics sensors I set up to track rainfall looking back finite periods of time from the present fail after receiving data values reliably initially. Until there was a larger number of new values it was working great. Each look back sensor was updating and then aging out the old data exactly as intended. At the point in time shown here it was behaving exactly as it was expected to be.

July12_2023_rain1232

Then later as more rain amounts came in I could see that the rain information being sent from the ESP32 ESPHome device, just as before. But for some reason that I simply can’t figure out why, the statistics sensors stopped updating for a period of time, or would only update occasionally. The sampling_size: parameter is plenty large enough since data is received from ESPHome only once per minute. Here is a snipped for 3 of the statistics sensor code:

# Generates rain amount per last period of time - sliding window
#  - sensor.rain_gauge via ESP32
  - platform: statistics
    name: 1h Rain Statistics
    entity_id: sensor.rain_gauge
    state_characteristic: total
    sampling_size: 100
    max_age:
      hours: 1

  - platform: statistics
    name: 6h Rain Statistics
    entity_id: sensor.rain_gauge
    state_characteristic: total
    sampling_size: 500
    max_age:
      hours: 6

  - platform: statistics
    name: 12h Rain Statistics
    entity_id: sensor.rain_gauge
    state_characteristic: total
    sampling_size: 1000
    max_age:
      hours: 12

The rest of the entries that look farther back in time do have the sampling_size: and max_age: settings set accordingly.

It is not a matter of just some of the statistics sensors failing to see the new data, but all of them.

The ESPHome code that is supplying the data to the statistics sensor:

# GPIO contact closure - rain gauge bucket into ESP32 GPIO
  - platform: pulse_counter
    pin:
      number: GPIO33
      mode: INPUT_PULLUP
    unit_of_measurement: 'inches'
    name: "Rain Gauge"
    filters:
      # Each 0.01" (0.254mm) of rain causes one momentary contact closure
      - multiply: 0.01
    accuracy_decimals: 2
    icon: 'mdi:weather-rainy'
    id: rain_gauge
#    internal: true
    count_mode:
      rising_edge: INCREMENT
      falling_edge: DISABLE
    internal_filter: 13us
    update_interval: 60s

The above code is outputting values every minute continuously. When it is not raining those values are 0.00000.

Here is a ESPHome log entry seen when the rain bucket had tipped once:

[14:06:21][D][pulse_counter:174]: 'Rain Gauge': Retrieved counter: 1.00 pulses/min
[14:06:21][D][sensor:094]: 'Rain Gauge': Sending state 0.01000 inches with 2 decimals of accuracy

Now what might be a factor (wondering as I’m writing this) is that the missed statistics sensor updates occurred when the above value was 0.02000 or possibly more. Anyone have specific thoughts on that? But in my manually testing values larger than 0.01000 seemed to be behaving fine.

What I can’t get my head wrapped around is how it could be working exactly as expected to, but to only fail intermittently after a relatively large number of new data samples, specifically ones with values other than 0.00. After all the 0.00 values are the vast majority over long periods of time.

Other than this issue with new data being missed after a healthy bit of new rain data, I have the rain gauge tracking and display working exactly as I intended.

Also note that I’m not interested in using Utility Meter because of the time boundaries it enforces, i.e. no way to configure it to look back some X time period from the present.

Some additional background that is mostly still current for what I’m attempting to do:

https://community.home-assistant.io/t/diy-zigbee-rain-gauge/255379/359?u=thiswaytoo

ThisWayToo · July 13, 2023, 2:08am

Some follow up info.

The aging out process of the various duration look back windows is progressing exactly as it should be as of now. So other than a bunch of new data values being ignored a few hours ago, things are behaving as they should be.

ThisWayToo · July 13, 2023, 9:40pm

Another follow up.

Just had brief pretty intense rain shower. Once again the values from the statistics sensors are off, way off in this case by a factor about 2. The actual rain was .42, but the statistics only show .2. Since this was a brief, but more intense rain than the long event yesterday, the dropped data occurred at a higher percentage. So there definitely seems to be a data value / rate thing at play.

Today as it was falling behind I checked the buffer_usage_ratio: and the age_coverage_ratio: values under developer tools. Neither looked out of line.

It still doesn’t make sense to me why the change in data values coming in should cause a problem. They are always just one per minute. During an earlier drizzle today everything worked 100% as it should.

ThisWayToo · July 17, 2023, 2:54pm

After spending some time reviewing this existing bug, I think I have an idea where the root of the problem I’m seeing lies:

https://github.com/home-assistant/core/issues/67627

In addition to the repeated zero value issue that bug discusses, there is a similar behavior where two or more identical values back to back do not get recorded. This comment in that bug describes that:

" There’s still an issue, and that is when a value is repeated. When a sensor has an update but it’s the same value as the previous value, it isn’t stored in the recorder database is this is where weird things happen. Lots of sensors have enough noise and small enough resolution where they don’t see this. But if the sensor has a hard min or max (zero being the common one), or if it has low resolution and/or low noise, this issue pops up."

In my heavier rainfall case I can easily see that there could be back to back values of 0.03 for example. Meanwhile for the slow light rain scenario the 0.01 values are almost always separated by one or more 0.00 values.

mekaneck · July 18, 2023, 12:33pm

The recorder database not storing repeated values is a known and intended feature, but in my opinion the statistics sensor doesn’t appropriately account for it and that is where the problem lies.

I’m not knowledgeable enough in Python and the HA codebase to create my own integration or supply a PR to fix the issue, although from a logic and mathematical perspective I understand the cause and solution very well.

The workaround is to create a template sensor for each sensor that you want to calculate statistics from, and force a datapoint to be stored in the database even if values are repeated. The way to do that is to create an attribute that changes as often as you want to store a data point. When the attribute changes, the database has to record that attribute and the sensor state along with it.

I haven’t tested this code but the template would look something like this:

template:
  - sensor:
      - name: "rain gauge duplicate"
        unit_of_measurement: "in"
        state_class: "measurement"
        state: >
            {{ states('sensor.rain_gauge') }}
        attributes:
            minute_counter: "{{ now().minute }}"

ThisWayToo · July 18, 2023, 9:21pm

Thank you, thank you, thank you for the excellent code snippet. It was almost drop in ready.

I had spent most of yesterday trying to add some low level “dither” to the data stream. It could be more accurately described as a beating my head against the wall exercise. Conceptually coming up with ideas for how to deal with the statistics sensor’s undesirable behavior for repeating values it is fairly easy. Though I hadn’t thought of the attribute idea. Getting the proper code and syntax that is completely working or correct is a different story.

Unfortunately I seem to have uncovered another annoying behavior of the statistics sensor. When running that code to add an attribute value, I will get double readings of the data from the template sensor. The what that is now happening is plain to see here:

The output of the template sensor looks to be every 60 seconds as it receives data from ESPHome every 60 seconds. However the statistics sensor seems to do an additional reading before the next 60 seconds has elapsed. Not sure why it would be doing that. My belief was that it simply waits until there is a new value sent to it, perhaps that is not actually true and there is perhaps some polling at play.

So am now trying to think of a way to work around this latest roadblock…

One idea that I want to try is see if shortening the update interval of the ESPHome sensor to 50 or 55 seconds will do the trick. But that would require modifying the ‘now’ time attribute to be something other than once per minute.

Another possibility is to figure out how to toggle the template sensor back 0.00 after a few seconds and well before the next value comes in from the ESPHome sensor. Since I’m using state_characteristics ‘total’ in the statistics senors, those extraneous zero values wouldn’t hurt anything.

Right now I’m waiting see how the aging out process is now working, and how the long term zero rain amount values (now with attributes) looks like in the statistics sensor fed from the template sensor.

mekaneck · July 19, 2023, 1:45am

The template sensor will update itself when any of its sources are updated. Which in your case will be at the top of every minute per HA’s own clock (due to now() being a source, unrelated to the fact that you’re using the .minute property) and then also every time the rain sensor is updated, which according to you is once every minute (but unlikely to be aligned with the top of every minute according to HA’s clock). So the sensor will be updated twice a minute, but not necessarily every 30 seconds. For example it could be 10 seconds between updates, then 50 seconds for the next update, then 10 seconds, etc.

That said, only updates that are different will be recorded. The update at the top of the minute will always be different, but the next update will only be different if the rain gauge has a new value.

The statistics sensor should update when the source sensor gets a new value added to the database, and also when a source sensor data point ages out of the max_age window.

If you are confident that your rain gauge will provide an update on a consistent time interval, then you don’t need to enforce that time interval in your template sensor attribute and you can remove the reference to now(). Instead, you could do something simple like set the attribute to 1 if it is currently 0, otherwise set to 0. This would just give you an alternating 1 and 0 on every update and therefore force the sensor to record the data whenever an update is sent.

mekaneck · July 19, 2023, 2:02am

I’m thinking something like this, which has an attribute which alternates between -1 and +1. Since this only relies on the sensor.rain_gauge it will only update whenever that sensor updates.

template:
  - sensor:
      - name: "rain gauge duplicate"
        unit_of_measurement: "in"
        state_class: "measurement"
        state: >
            {{ states('sensor.rain_gauge') }}
        attributes:
            opposite: "{{ -1 * (this.attributes.opposite | default(1)) }}"

mekaneck · July 19, 2023, 3:03am

One more option, if the previous one doesn’t work. I can’t confirm that the previous one will update on repeated values, so it might not work as desired.

This option will ignore updates from the source sensor and only update on a fixed time interval:


template:
    - trigger:
        - platform: time_pattern
          minutes: "/1"
      sensor:
        - name: "rain gauge duplicate"
          unit_of_measurement: "in"
          state_class: "measurement"
          state: >
            {{ states('sensor.rain_gauge') }}
          attributes:
            opposite: "{{ -1 * (this.attributes.opposite | default(1)) }}"

ThisWayToo · July 20, 2023, 8:55pm

Got back to looking at this today after taking a needed day off from dealing with this, and was pretty busy otherwise yesterday.

Based on my testing the last option with the trigger achieves what is needed with the cost of adding one additional minute of latency from bucket tip to it being reported. I can live with that since one of the main goals I have for the rain gauge is to be able to grab the tablet while still in bed in the morning to see how much rain there may have been overnight while sleeping. So up to the minuteness isn’t that important.

The previous one without the trigger as you suspected didn’t handle repeated values correctly.

So now I will wait and see how it works in a for real period of on and off rain. Given the current forecast that may take awhile.

Thanks again, the guidance has been a huge help.

millskyle · June 7, 2024, 2:41pm

For future people’s reference, you can also have it update on bucket tip by adding a state trigger, e.g.

template:
  - trigger:
      - platform: time_pattern
        minutes: "/1" #every minute
      - platform: state
        entity_id:
          - sensor.rain_gauge 
 ...

mekaneck · June 7, 2024, 9:21pm

Well, not really. That’s what all the fuss is about.

Yes, your example is a valid configuration that will trigger at the top of every minute and upon a state change of the rain gauge sensor.

But that doesn’t achieve the desired result.

The presumption here is we have a rain sensor which outputs, once per minute, a number representing mm of rainfall accumulated during that minute. And our problem is that HA ignores repeated values, so if it measures 2.47mm of rain for two consecutive minutes, the second report will be ignored by HA and whatever entity is summing them won’t be triggered to perform the sum. The solution is to trigger once per minute, and take whatever the current value of the rain sensor is, and add it to the sum. This works but adds up to 59.99999 seconds of latency to the value.

If we add an additional trigger as you suggest, we now get two values to sum within a one minute period. So that would double the result that we expect.

If you really wanted to solve the latency problem, you would have to trigger upon a state change, and then start a 1-minute timer, and trigger every one minute afterwards. However then we have a race condition where we can still get a double-count if 1 second expires but the rain sensor delivers a result in 1.0001 seconds. And no matter the delay (or offset) that is used, if the devices are not using synchronized clocks, that race condition “collision” is unavoidable.

All that being said, there was recently a last_reported property added to every entity, and I haven’t played with it yet but it should be possible to trigger off this and avoid all this nonsense entirely.