Integral helper doesn't look right

I’ve recently added a Reimann sum integral helped in order to convert my solar wattage to Wh measurements. Once it gets going for the day, it seems to work fine, but the initial watts from the panel cause a huge jump in energy. Can anyone explain what’s happening here?

Today was really dark, so the first power from the panel was only 83 W, but the energy sensor jumped up 708 Wh. After that, it correctly increases by about 4 Wh every 3 minutes.

The wattage measurement comes from the Sunny Boy integration, and the helper sensor is a trapezoid Reimann sum integral.

This is consistent behavior since I created the helper. Here’s a 24 hour window showing the entire previous day. That start of day also begins with 83W from the panels and an immediate jump of 359 Wh.

Thanks in advance for the help!

Use method: left instead of the default trapezoidal.

1 Like

Thanks, @tom_l left seems to give a much better result.

Looks like this is even documented in the docs:

In case you expect that your source sensor will provide several subsequent values that are equal, you should opt for the left method to get accurate readings.

Seems like trapezoidal method shouldn’t be the default if it fails so dramatically when readings repeat.

Yes indeed.

I don’t entirely disagree, as I bet upwards of 99% of all integral helpers HA folks have setup are for exactly this kind of scenario. HOWEVER, from a more generalized/mathematics/engineering perspective I think trapezoidal is generally more useful and the left/right variations are more for edge cases (at least that’s my recollection from Calculus 2, but it’s been a few years :grimacing: ). At least in my case it seemed like we mainly used the trapezoidal method and the others were just talked about as alternative options for special circumstances (which this happens to be).

Trapeziodal is perfect for sampling changing values on a periodical basis. But most sensors send a new value when a change happens, and then nothing until a significant new value is detected. Even worse, HA ignores duplicates to limit database storage. If there are no intermediate samples on non changing values, trapezoidal goes haywire and left is your friend. Perfect for HA.

See my explanation here:

1 Like

Thanks @Edwin_D for the link to that explanation – it explains why trapezoidal is failing even though (as @brooksben11 is saying) it seems like trapezoidal should be more accurate.

To me it seems like HA’s implementation of trapezoidal is faulty – it should assume a consistent time period of calculations and repeated values should net zero additional value.

But anyway, left seems to be working pretty well, so my problem is fixed at least. I’m sure this has been posted before and will be posted again until the default method is changed, or the trapezoidal method is reimplemented.

The docs state that repeated values are a problem for trapeziodal, but that is a half truth. If there were frequent repeated values (as happens in time sampled data) then there would not be a problem with trapezoidal.

The problem is that repeated values are omitted by many sensors and HA itself. It is long periods of constant value without new samples that are the culprit.

The trapeziodal implementation itself is fine, it is just not suited for the situation. The minute that HA would start inventing intermediate measurements to “fix” trapezoidal Riemann, it would effectively turn trapezoidal Riemann into left Riemann.

Left Rieman is used for: if there are no new samples, assume the value has not changed since the last update (HA: send the update as soon as something changes, and only if it changes). Trapezoidal is used for: assume the value gradually changed since the last measurement (Time based, often slow, sampling).

The most used situation for Riemann is to convert Watts to kWh. Consider a 10W lamp. It either uses 0W or 10W, so the power graph is a block graph, the changes are pretty much reflected instantly, only a few log records to work with off, on, off. It makes no sense that trapezoidal would be more accurate for this anyway, as it would measure a piramid with the top at the time when the light is turned on. And if there are many power fluctuations, both methods do quite well. The difference would be neglible. In the graph below, for a light turned on a couple of times, blue would be the left Rieman interpretation, red would be trapezoidal:

1 Like

Thanks again @Edwin_D for the very helpful reply.

Agreed.

Yes, but only whenever the invented values corrupt the actual curve. I would propose that if a real value is read, then the invention timer should be reset to avoid corrupting the curve.

Consider the solar power charts in my initial posts. While the sun is up, Trapezoidal and Left Reimann sums will give different answers, and I believe Trapezoidal would be more accurate because it would try to account for the smoothly changing power between the sample points. The only point at which it should change Trapezoidal into Left is at the start of the solar day, my point of confusion above.

I know the sample rate of my solar power, so if I set the “invention timer” to be longer than that, I would only get invented values while the panels were not updating the 0 value.

Agreed with your last paragraph. Left makes a lot of sense for “binary” or fixed value power consumption. But for variable power sources or power consumers (like solar or a modern inverter heat pump), “left-start trapezoidal Reimann sum” would be awesome.

The type of Riemann sum describes exactly how to estimate (or invent) values that aren’t present in the data. My point was that that trapezoid cannot be “fixed” because that would stop it from being trapezoid. Trapezoid is the data invention method, if you want to call it that way. It works this way by definition.

The only way trapezoid will work right on Home Assistant data is when you have extra (real) data points in the database to prove the value hasn’t changed. There’s no other way. If these data points are not there, then trapezoid is designed to interpolate with the last known datapoint.

For solar power trapezoid will work reasonably well because the first value is close to 0. Solar power inverter integrations are also close to polling behavior. But the fact remains that as far as I know the recorder has recently been changed to ignore the extra datapoints. It will hurt for solar data too.

I am also convinced that left Riemann won’t be far off on solar data. Sure, sometimes it will underestimate a bit (on the rise) but then it will overestimate a bit (on the fall). Because the graph mostly shows the typical normal distribution curve, the deviation will be very small. Both methods will be off a bit depending on the polling frequency. If the solar inverter gives kWh it should) then use that, it will be way more accurate.

But because there is a very long period of 0, the first watt coming in may be counted for half of all the hours it was dark if there is only one 0 in the database. And most inverters have a cutoff, they wont start or stop at a fraction of a Watt, rather a multiple. That is a way more significant error than the perceived gain in accuracy of trapezoid on solar data that is fluctuating during the day.

If you have integral sensors for both methods, you can see how much the trapezoid it is off in the night by the peek at the start of the solar production. You can also compare with a left sum if you substract the error peek when comparing. Then you’ll see how much they differ during actal priduction after a day.

I used the seemingly binary light behavior because it illustrates the best way how the errors occur if the value does not change. It will be less on fluctuating data, I mentioned that.

There’s only one way I know that people use to get more datapoints in the recorder, and that is to introduce very slight fluctuations in the data using a timer. That won’t make the data more accurate though. It could help for the derivative sensor, which is no longer usable for sensors with large constant periods for the same reason that I described here. There is a bug report for that, because the derivative is at times not 0 while the value is constant. That illustrates it is a real problem when datapoints are ignored.