Merge statistics of old entity to new entity

theneweinstein · October 15, 2023, 3:22pm

Since version 2023.4 home assistant keeps its history when you rename an entity (see link). As far as I know it is not possible to merge old entities from before that time. The script I made worked for that, but requires some understanding of SQL. I’m neither a SQL programmer, just a lot of trail and error (with a backup of the database of course) did the trick in the end.

fversteegen · October 15, 2023, 5:15pm

Looks great! Is this something i can execute in phpmyadmin?

ndrwha331 · October 16, 2023, 2:50pm

No, it is a python script that needs to be executed from HA shell (python merge.py when in config directory). The complete script is as below. Like other said, this should be a last resort solution and preceded with a backup of your data. I needed it once to merge entities, each with a different part of history in it.

mapping = {
    'sensor.source_entity1': 'sensor.dest_entity1',
    'sensor.source_entity2: 'sensor.dest_entity2',
}

import sqlite3
import pandas
#pandas.options.plotting.backend = "plotly"

# Read sqlite query results into a pandas DataFrame
con = sqlite3.connect("home-assistant_v2.db")

def merge(source_sensor, target_sensor, table='statistics'):
    print(f"source: {source_sensor}, target: {target_sensor}")
    # read data from target sensor
    target_sensor_id=pandas.read_sql_query(f"""select id FROM statistics_meta 
                                               where statistic_id like '{target_sensor}';""", con).loc[0,'id']
    target_df=pandas.read_sql_query(
            f"select * FROM {table} where metadata_id = '{target_sensor_id}';", con
        )
    print(f"length of existing statistics for target sensor: {len(target_df)}")

    min_dt = target_df["start"].min()

    # read data from source sensor
    source_sensor_id=pandas.read_sql_query(f"""select id FROM statistics_meta 
                                                where statistic_id like '{source_sensor}';""", con).loc[0,'id']
    source_df=pandas.read_sql_query(
            f"select * FROM {table} where metadata_id = '{source_sensor_id}';", con
        )
    print(f"length of existing statistics for source sensor: {len(source_df)}")


    # concat new historical statistics with updated old statistics
    #df = pandas.concat([source_df, target_df], ignore_index=True).sort_values("start")

    # set metadata_id to the one of the target sensor
    #df["metadata_id"]=target_sensor_id

    stmnt2 = f"""UPDATE {table}
                SET metadata_id = {target_sensor_id}
                WHERE metadata_id = {source_sensor_id};"""

    stmnt = f"""UPDATE {table}
                 SET metadata_id = {target_sensor_id}
                 WHERE metadata_id = {source_sensor_id} AND start_ts NOT IN
                     (SELECT s.start_ts FROM
                         (SELECT start_ts FROM {table}
                             WHERE metadata_id = {source_sensor_id}) s
                         INNER JOIN
                         (SELECT start_ts FROM {table}
                             WHERE metadata_id = {target_sensor_id}) t ON s.start_ts = t.start_ts);
             """

    stmnt3 = f"""SELECT * FROM {table}
                 WHERE metadata_id = {source_sensor_id} AND start_ts NOT IN
                     (SELECT s.start_ts FROM
                         (SELECT start_ts FROM {table}
                             WHERE metadata_id = {source_sensor_id}) s
                         INNER JOIN
                         (SELECT start_ts FROM {table}
                             WHERE metadata_id = {target_sensor_id}) t ON s.start_ts = t.start_ts);
             """

    cur = con.cursor()    
    cur.execute(stmnt)
    con.commit()

    #res=pandas.read_sql_query(stmnt2, con)
    #print(res)


def merge2(source_sensor, target_sensor):
    for table in ("statistics", "statistics_short_term"):
        print(table)                              
        merge(source_sensor, target_sensor, table)

for source_sensor, target_sensor in mapping.items():
    merge2(source_sensor, target_sensor)

smaknomercy · October 28, 2023, 10:22am

Hi! Any updates on this? I tried this script, it successfully read the data from the old sensor, but didn’t write it to the new sensor. Any ideas?

upd: it didn’t read old sensor statistics, but read statistics_short_term

flash3d · November 9, 2023, 4:06pm

Is it possible to give me a step by step instruction if i just want to replace one defect power switch for a new one and keep the history? (so basically move the energy history to the new switch).

I still read conflicting posts:

Before 2023.4:

Delete old device and entities.
Add new device and give the new entities the same name as the old entities.

After 2023.4 (history is retained with renaming entities)

Rename old entities (and device).
Add new device and the rename entities? But to what name? Here I don’t get it anymore.

Found this: 2023.4: Custom template macros, and many more new entity dialogs! - #621 by e-raser?

molelightn · January 3, 2024, 7:39am

same question for me

flash3d · January 3, 2024, 7:54am

Look here: https://community.home-assistant.io/t/import-old-energy-readings-for-use-in-energy-dashboard/341406/75?u=flash3d

molelightn · January 3, 2024, 8:08am

oh god this does the trick. didn’t expect to delete the old entitiy first and get it working by simply renaming the new to the old name. Thank you very much for your hint.

BoGnY · February 5, 2024, 8:48am

This is the python script adapted from @ndrwha331 to update states instead statistics:

mapping = {
    'sensor.source_entity1': 'sensor.dest_entity1',
    'sensor.source_entity2': 'sensor.dest_entity2',
}

import sqlite3
import pandas
#pandas.options.plotting.backend = "plotly"

# Read sqlite query results into a pandas DataFrame
con = sqlite3.connect("home-assistant_v2.db")

def merge(source_sensor, target_sensor):
    print(f"source: {source_sensor}, target: {target_sensor}")
    # read data from target sensor
    target_sensor_id=pandas.read_sql_query(f"""select metadata_id FROM states_meta where entity_id like '{target_sensor}';""", con).loc[0, 'metadata_id']
    target_df=pandas.read_sql_query(f"select * FROM states where metadata_id = '{target_sensor_id}';", con)
    print(f"length of existing states for target sensor: {len(target_df)}")

    # read data from source sensor
    source_sensor_id=pandas.read_sql_query(f"""select metadata_id FROM states_meta where entity_id like '{source_sensor}';""", con).loc[0, 'metadata_id']
    source_df=pandas.read_sql_query(f"select * FROM states where metadata_id = '{source_sensor_id}';", con)
    print(f"length of existing states for source sensor: {len(source_df)}")

    stmnt2 = f"""UPDATE states
                SET metadata_id = {target_sensor_id}
                WHERE metadata_id = {source_sensor_id};"""

    cur = con.cursor()
    cur.execute(stmnt2)
    con.commit()

for source_sensor, target_sensor in mapping.items():
    merge(source_sensor, target_sensor)

I’ve tested it, but make a BACKUP for security!!!

Dutchess_Nicole · March 16, 2024, 9:07am

Does this script take into account possible differences in unit_of_measurement? For instance I have two sensors I want to merge, but one of them registered energy in kWh and the other in Wh, so I’d need to account for that by multiplying the first series by 1000 so that the unit sizes match up.

DG-Lt · April 3, 2024, 6:29pm

not sure if this integration is already mentioned her. might be usefull to someone on this thread

I used it myself succesfully

Focus · May 3, 2024, 7:52am

I can confirm that in 2024.4.4 it certainly doesn’t retain history. I’ve renamed a few devices (after which it prompts to rename the identies, which I did). History is now gone.

petro · May 3, 2024, 1:17pm

Refresh the page.

Focus · May 4, 2024, 7:19am

No, it’s gone.

Just tested with another entity. If I change the name, old history disappears.
But when I change the name back to the old name, the history reappears.

So renaming an entity effectively causes the history to be ‘lost from view’.

bobbyverlaan · August 30, 2024, 2:13pm

I’m up to date with v2024.8.3 and followed the following steps:

delete old entity: “entity_name”
rename new entity “entity_name_2” to old entity name “entity_name”

What I see happening:

historical data of the old entity “entity_name” becomes available again
historical data of the new device “entity_name_2” becomes unavailable
When I switched back to the old entity name “entity_name_2”, the historical data that was captured under that entity name pops up again.

I think what effectively happens: device starts using the old entity name and records sensor data on that entity name from here on. The data of the old and new entity don’t get merged. This still leaves a gap of the historical data of the new device before it gets renamed.

Would love to get 1 historical perspective on recorded sensor data. To get to that merging the historical data of the two entities into one entity makes sense. If I understand correctly, the Python script provided in this thread will do just that. However, I fear I might corrupt the data model Home Assistant expects now or in the future.

Does anyone know if there any plans to create official support for this and make merging a replacement device into a replaced device a breeze? Maybe even through the UI?

petro · August 30, 2024, 3:01pm

Do not rename the new entity. History follows the entity when you rename it. You delete old entity, integrate new entity with old entities name. That is the only way. You cannot have both on the system at the same time, if you do, the migration will not occur.

There are no plans as far as I can tell. For now, you’ll have to wait for a volunteer to pick it up.

bobbyverlaan · September 19, 2024, 9:57pm

If I understand you correctly. If I’ve have been running “entity_name” for 3 months, device broke down. Then added the replacement which accidentally was named “entity_name_2” because “entity_name” was already taken. Ran under the new “entity_name_2” for another 3 months. Only then to realize the history data is now on 2 entities.

In the scenario above there is no way to merge the data. Correct?

petro · September 20, 2024, 1:59am

You can attempt to modify the tables in the database. That’s the only option that exists currently.

DevMartinator · September 26, 2024, 11:02am

I had the same issue and solved it.
You can try to install SQLlite Web addon.
Then this SQL command, generated by ChatGPT did the job for me:

INSERT INTO statistics (id, created_ts, metadata_id, start, start_ts, mean, min, max, last_reset, last_reset_ts, state, sum)
SELECT NULL, 
       created_ts, 
       (SELECT id FROM statistics_meta WHERE statistic_id = 'sensor.new'),
       start, 
       start_ts, 
       mean, 
       min, 
       max, 
       last_reset, 
       last_reset_ts, 
       state, 
       sum
FROM statistics
WHERE metadata_id = (SELECT id FROM statistics_meta WHERE statistic_id = 'sensor.old')
AND start_ts NOT IN 
(SELECT start_ts FROM statistics WHERE metadata_id = (SELECT id FROM statistics_meta WHERE statistic_id = 'sensor.new'));

Maybe there is a better way…

splunker · September 28, 2024, 1:09pm

Thanks very much for sharing this, it was very helpful!
FWIW, this is my version, updating old entity ids instead of delete/appending.

Use at your own risk of course, Backup your DB first.
Error handling is kind of cruddy.
I run this interactively via the pyscript kernel.

import pandas as pd
import sqlite3
import pandas as pd

con = sqlite3.connect("home-assistant_v2.db")

def update_id(old_sensor_id, new_sensor_id, table):
    assert table in ["statistics", "statistics_short_term", "states"]

    log.warning(f"Updating {old_sensor_id} to {new_sensor_id} in {table}")

    try:

        if table == "states":
            # States
            ts_field = "last_updated_ts"

            old_meta_id = pd.read_sql_query(f"""SELECT metadata_id 
                                                FROM states_meta
                                                WHERE entity_id = '{old_sensor_id}';""", con).loc[0,"metadata_id"]
            
            new_meta_id = pd.read_sql_query(f"""SELECT metadata_id 
                                                FROM states_meta
                                                WHERE entity_id = '{new_sensor_id}';""", con).loc[0,"metadata_id"]
        else:
            # Statistics
            ts_field = "start_ts"

            old_meta_id = pd.read_sql_query(f"""SELECT id 
                                                FROM statistics_meta
                                                WHERE statistic_id = '{old_sensor_id}';""", con).loc[0,"id"]   
            
            new_meta_id = pd.read_sql_query(f"""SELECT id 
                                                FROM statistics_meta
                                                WHERE statistic_id = '{new_sensor_id}';""", con).loc[0,"id"]

        new_ts_min = pd.read_sql_query(f"""SELECT MIN({ts_field}) as ts_min 
                                        FROM {table} 
                                        WHERE metadata_id = '{new_meta_id}';""", con).loc[0,"ts_min"]
            
        stmnt = f"""UPDATE {table} 
            SET metadata_id = {new_meta_id}
            WHERE metadata_id = {old_meta_id}
            AND {ts_field} < {new_ts_min}
            """
        
        cur = con.cursor()
        cur.execute(stmnt)
        con.commit()    

    except Exception as e:
        log.warning(e)
        pass

# mapping of the sensors, statistics of the left ones will be merged into the right ones
mapping={
    "sensor.solarman_ac_temperature": "sensor.solarman_temperature",
    "sensor.solarman_current_l1": "sensor.solarman_internal_l1_current",
    "sensor.solarman_current_l2": "sensor.solarman_internal_l2_current",
    "sensor.solarman_inverter_l1_power": "sensor.solarman_internal_l1_power",
    "sensor.solarman_inverter_l2_power": "sensor.solarman_internal_l2_power",
    "sensor.solarman_grid_current_l1": "sensor.solarman_grid_l1_current",
    "sensor.solarman_grid_current_l2": "sensor.solarman_grid_l2_current",
    "sensor.solarman_grid_voltage_l1": "sensor.solarman_grid_l1_voltage",
    "sensor.solarman_grid_voltage_l2": "sensor.solarman_grid_l2_voltage",
    "sensor.solarman_total_grid_production": "sensor.solarman_total_energy",
    }

for old_sensor_id, new_sensor_id in mapping.items():
    for table in ["statistics", "statistics_short_term", "states"]:  
        update_id(old_sensor_id, new_sensor_id, table)

Thanks again