Clean up broken result of Rieman sum

pohly · September 14, 2024, 4:29pm

I am collecting the current solar production as reported by our installation through a RESTful integration. That gives me power in Watt. I defined a helper which uses the Rieman sum, I believe with left sum (cannot check, not shown in the UI?!), to calculate energy in Wh.

Today, I had to turn off the computer on which HomeAssistant runs for three hours. Once I turned it on again, the energy value jumped by more than 17 MWh. Here’s a screenshot of the two values:

The input value (power) has no values while offline. The graph shows that as a horizontal line. There’s a small jump at the right edge from the old 2195 W from around 11:10 to the new new 2450 W at 12:55.

This broken energy value ruins our energy dashboard. It says we earned over 1000,- EUR today - if only that was true!

Is there a way how I can clean this up? Was it caused by some bug?

brooksben11 · September 14, 2024, 4:50pm

Just click settings for the integral helper and then integral sensor options to see how it’s setup as well as change it.

And what you’re experiencing sounds an awful lot like you’re not using the left method.

pohly · September 14, 2024, 5:11pm

The only option that it shows is “Accuracy”, the number of digits. I’m still on HomeAssistant 2023.10.3 - perhaps a UI to show this was added later? Time to update…

I know that in the past I mistakenly used the default method, then learned that it’s not recommended for noisy data and edited config files manually to fix the configuration. I can double-check for this one.

pohly · September 14, 2024, 5:15pm

It’s configured to use the “left” method. From .storage/core.config_entries:

      {
        "entry_id": "5cca47c5c8e8900e52e11d52ee2c65e5",
        "version": 1,
        "domain": "integration",
        "title": "Solar Energy",
        "data": {},
        "options": {
          "name": "Solar Energy",
          "source": "sensor.solar_power",
          "method": "left",
          "round": 2.0,
          "unit_prefix": "none",
          "unit_time": "h"
        },
        "pref_disable_new_entities": false,
        "pref_disable_polling": false,
        "source": "user",
        "unique_id": null,
        "disabled_by": null
      },

brooksben11 · September 14, 2024, 6:32pm

I’d start with updating, I know there were a number of integral issues that got resolved in the last year.

pohly · September 14, 2024, 6:44pm

Updated. Now the UI shows the method and allows editing it. But as established before, it’s already set to “left”.

Is it possible to trigger recalculation of the old values? The big jump is still there.

Or was it recalculated and gave the same result?

pohly · September 14, 2024, 7:57pm

Several other of my Rieman sum helpers have the same problem - perhaps all where the last known value was non-zero.

tom_l · September 15, 2024, 2:59am

No but you can delete the erroneous statistics.

Go to Developer Tools → Statistics. Click on the ramp icon to the far right of your sensor. Navigate to the time the error occurred and either delete it or set it to the average of the values before and after it.

pohly · September 15, 2024, 6:12pm

This problem with Rieman sum hasn’t been fixed in HomeAssistant 2024.9.1. I checked by copying my installation’s data at 10:00 this morning and bringing it up elsewhere 10 hours later. Same effect, there’s a huge jump in the calculated sensor.

@dgomes: I’ll file a proper bug report in GitHub, this is where I described it initially.

→ Integral integration: big jump in calculated data when starting HomeAssistant after a long (hours) delay · Issue #126006 · home-assistant/core · GitHub

dgomes · September 15, 2024, 9:39pm

It’s in the nature of the integration to work like this: always calculate based on the last valid state, this addresses short period of time (during restart) which are very common.

Long periods of time are not considered (they are uncommon) and from the integration point of view it’s undistinguishable the 2 cases.

pohly · September 16, 2024, 6:07am

In my case, the last value was 2195W roughly 3 hours ago when the integration started again. I would understand if it had added 3h* 2195W = 6585Wh to the old energy value, but instead it added 17497728Wh. The math just doesn’t work out.

What time delta does the integration use as multiplier? The time since the last sample?

dgomes · September 16, 2024, 8:26am

time the last state change was received.

It just occurred to me… during the downtime, did you restore a previous backup ? The last state change is not kept in the recorder, it’s kept in a special file in .storage

pohly · September 16, 2024, 8:28am

Let’s continue discussing in the issue. We can report back here for the forum once there is a conclusion.

pohly · October 7, 2024, 6:47pm

I’ve not been able to reproduce the problem of big jumps in the integration.

What I can share now is how my wife and I fixed the broken data. The following Python script needs to be run inside the “config” directory with the home-assistant_v2.db. It has to be edited to suit the local configuration (list of sensors) and the time of the incidence.

Fixing the data wasn’t a big priority, so when we finally started to address this in earnest, the broken data had made it all the way back into the long-term statistics. All data in the short-term statistics and current states were after the jump.

Therefore the script determines the undesirable delta based on the long-term statistics. For more recent incidences where the long-term statistics have not been updated yet, further changes are needed. I am not sure whether this is needed, I don’t have such a database to test with. Perhaps long-term statistics get created immediately and will always have the delta?

#!/bin/env python3

import sqlite3
import datetime
import json

# This is the list of integration sensors with bad data.
sensors = [
    "sensor.energy_from_grid",
    "sensor.energy_to_grid",
    "sensor.solar_energy",
    "sensor.battery_charging_energy",
    "sensor.battery_discharging_energy",
    "sensor.other_energy",
    "sensor.house_consumption_energy",
    "sensor.svenja_energy",
    "sensor.merle_energy",
    "sensor.bad_energy",
    "sensor.dach_energy",
    "sensor.schlafzimmer_energy",
    "sensor.luftraum_energy",
    "sensor.sat_energy",
    "sensor.technikraum_energy",
    "sensor.batterie_energieverbrauch",
    # "sensor.wohnzimmer_verbrauch",
    # "sensor.dunstabzug_verbrauch",
    # "sensor.garten_verbrauch"
]

# The date and time when HomeAssistant was offline, in UTC. Data prior to this
# is assumed to be correct and data after it will get corrected.
T = datetime.datetime.fromisoformat("2024-09-14T11:30").timestamp()
print(f"Checking for changes around {T}.")

# Only sensors whose change exceeds this threshold in percent get updated.
# It does not matter whether they were increased or decreased.
threshold_percent = 5

# This script must be run in the "config" directory which contains the SQLite
# database.
conn = sqlite3.connect('home-assistant_v2.db')
cursor = conn.cursor()
def fetchone(msg):
    result = cursor.fetchone()
    if result:
        return result[0]
    else:
        print(msg)
        exit(1)

# SQL queries are composed using string formatting because that allows
# replacing e.g. the table name. SQL placeholders are only support
# where values are expected.

# What was the last correct value, based on "statistics" or "statistics_short_term"?
query_Wpre = '''
      SELECT state
         FROM %(table)s
         WHERE metadata_id = %(id)d AND start_ts < %(ts)f AND state IS NOT NULL
         ORDER BY created_ts DESC
         LIMIT 1;
'''

# Same for first incorrect value.
query_Wpost = '''
      SELECT state
      FROM %(table)s
      WHERE metadata_id = %(id)d AND start_ts >= %(ts)f AND state IS NOT NULL
      ORDER BY created_ts ASC
      LIMIT 1;
'''

# Which metadata_id corresponds to a sensor ID?
#
# state and statistics tables use different metadata_ids
# and both have their own _meta table with the mapping.
query_id = {
    "states": '''
       SELECT metadata_id
          FROM states_meta
          WHERE entity_id = "%(sensor)s"
          LIMIT 1;
''',
    "statistics": '''
       SELECT id
          FROM statistics_meta
           WHERE statistic_id = "%(sensor)s"
           LIMIT 1;
''',
}

# Query for the scale factor.
query_unit = '''
   SELECT statistic_id, unit_of_measurement
       FROM statistics_meta;
'''

# Update all tables by substracting a delta.
query_update_stats = '''
   UPDATE %(table)s
      SET %(column)s = %(column)s - %(delta)f
      WHERE metadata_id = %(id)d AND created_ts >= %(ts)f AND state IS NOT NULL;
'''
query_update = {
    "states": {
        "state": '''
   UPDATE states
      SET state = CAST((CAST(state AS REAL) - %(scale)d * %(delta)f) AS TEXT)
      WHERE metadata_id = %(id)d AND last_updated_ts >= %(ts)f AND state IS NOT NULL AND state != "unavailable";
''',
    },

    "statistics": {
        "state": query_update_stats,
        "sum": query_update_stats,
    },

    "statistics_short_term": {
        "state": query_update_stats,
        "sum": query_update_stats,
    },
}

deltas = {}
for sensor in sensors:
    print()
    ids = {}
    for table in ("states", "statistics"):
        cursor.execute(query_id[table] % {"sensor": sensor})
        ids[table] = fetchone(f"{table}: no ID found for {sensor}")
    print(f"{sensor} IDs: {ids}")
    # Same in both tables.
    ids["statistics_short_term"] = ids["statistics"]

    # Some statistics are recorded as Wh, others as kWh. In both cases,
    # states are in Wh. Not sure why. The implication is that a delta
    # determined below based on statistics must get scaled when substracting
    # it from states.
    #
    # The "unit_of_measurement" in statistics_meta apparently can be used
    # to determine that scale factor.
    cursor.execute(query_unit)
    scale = {}
    for id, unit in cursor.fetchall():
        factor = 1
        if unit.startswith("k"):
            factor = 1000
        elif unit.startswith("M"):
            factor = 1000000
        scale[id] = factor

    # This code assumes that the time is so far in the past that the jump
    # has already been recorded in the long-term statistics and can no
    # longer be found in the short-term statistics or the states.
    #
    # This needs to be updated to use the other tables when the jump
    # occurred more recently.
    if True:
        table = "statistics"
        cursor.execute(query_Wpre % {"table": table, "id": ids[table], "ts": T})
        pre = fetchone(f"{sensor} has no old value.")
        cursor.execute(query_Wpost % {"table": table, "id": ids[table], "ts": T})
        post = fetchone(f"{sensor} has no old value.")

    delta = post - pre
    percent = int(abs((post - pre) * 100 / pre))
    print(f"{sensor} delta: {post} - {pre} = {delta} ({percent}%)")

    if percent < threshold_percent:
        print(f"Not updating {sensor}.")
        continue

    print(f"Updating {sensor}.")
    deltas[sensor] = scale[sensor] * delta
    for table, columns in query_update.items():
        for column, query in columns.items():
            query = query % {"table": table, "column": column, "id": ids[table], "ts": T, "scale": scale[sensor], "delta": delta}
            print(query)
            cursor.execute(query)

# Commit only after all updates succeeded.
conn.commit()
conn.close()

# Update the integration state.
restore_state_file = ".storage/core.restore_state"
with open(restore_state_file) as input:
    data = json.load(input)

for object in data["data"]:
    delta = deltas.get(object["state"]["entity_id"], None)
    if delta is not None:
        state = object["state"]["state"]
        if state != "unknown":
            object["state"]["state"] = str(float(state) - delta)
        object["extra_data"]["native_value"]["decimal_str"] = str(float(object["extra_data"]["native_value"]["decimal_str"]) - delta)
        object["extra_data"]["last_valid_state"] = str(float(object["extra_data"]["last_valid_state"]) - delta)

with open(restore_state_file, "w") as output:
    json.dump(data, output)

# sqlbrowser or any other sqlite client can be used to verify changes to the data:
#
# - select DATETIME(start_ts, 'unixepoch') AS time , * from statistics where metadata_id = 113 AND start_ts > unixepoch("2024-09-14T06:00")
# - select DATETIME(start_ts, 'unixepoch') AS time , * from statistics_short_term where metadata_id = 113 AND start_ts > unixepoch("2024-09-14T06:00")
# - select DATETIME(last_updated_ts, 'unixepoch') AS time , * from states where metadata_id = 75 AND last_updated_ts > unixepoch("2024-09-14T06:00")
#
# This seems to print time in local time, though, not UTC.