Merge statistics of old entity to new entity

Hi,

It would be nice if we would merge entities in the statistics database. When you for instance rename an entity or replace a sensor, it would be nice if the data from the old sensor could be merged with the new sensor. For instance, in my energy dashboard I now have two sensor for my solar panel production since I started using a different integration to monitor my panels, it would be great if I could merge that into one.

Looks like you get your wish next month:

1 Like

I have created a script that merges 2 existing entities: hass_utils/merge_sensor_statistics.ipynb at 51403be6a1f180c02c6e66e1bdc75e493192006e Ā· janmolemans/hass_utils Ā· GitHub

2 Likes

Thanks! Need to make some adjustments for my use case, but now I finally have all my old entities merged into the new ones.

Hello all,

I have the same problem. I have measured the energy with different meters (Shelly Plug S, Shelly 1PM, Shelly Plus 1PM, Solarman Integration).

Now I only use Shelly Plus 1PM. For the energy history, the statistics of the old sensors are still there but they are otherwise dead. In the Energy Dashboard settings comes ā€œEntity not definedā€.

Is there an option to merge the old values? The script didnā€™t work for me and broke the DB (backup helped :wink: ).

1 Like

I had the same issue, that the script broke my database. In the end I rewrote part of the script, so it updates the database instead of completely overriding it (merge_statistics Ā· GitHub)

It typically still gives an error, since the timestamp of the last entry of the old sensor is the same as the timestamp for the first entry of the new sensor, but this doesnā€™t break the database. The error message you get tells you the timestamp that is duplicate. In the end I had to manually remove one of these duplicates before I could merge (I used the phpMyAdmin addon for this).

You also will have to correct the statistics after merging (developer tools ā†’ statistics ā†’ adjust sum), otherwise I got a large negative value at the crossing between the old and new sensor.

Hi @theneweinstein

Thanks for your script,

however the script crashes in the merge function IĀ“m getting the following error :

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File /usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py:1410, in Connection.execute(self, statement, parameters, execution_options)
   1409 try:
-> 1410     meth = statement._execute_on_connection
   1411 except AttributeError as err:

AttributeError: 'str' object has no attribute '_execute_on_connection'

The above exception was the direct cause of the following exception:

ObjectNotExecutableError                  Traceback (most recent call last)
Cell In[5], line 5
      3 for table in ("statistics", "statistics_short_term"):    
      4     print(table)
----> 5     merge(source_sensor, target_sensor, table)

Cell In[4], line 5, in merge(source_sensor, target_sensor, table)
      2 print(f"source: {source_sensor}, target: {target_sensor}")
      4 # read data from target sensor
----> 5 target_sensor_id=pandas.read_sql_query(f"""select id FROM statistics_meta 
      6                                             where statistic_id like '{target_sensor}';""", con).loc[0,'id']
      7 target_df=pandas.read_sql_query(
      8         f"select * FROM {table} where metadata_id = '{target_sensor_id}';", con
      9     )
     10 print(f"length of existing statistics for target sensor: {len(target_df)}")

File /usr/local/lib/python3.9/dist-packages/pandas/io/sql.py:397, in read_sql_query(sql, con, index_col, coerce_float, params, parse_dates, chunksize, dtype)
    339 """
    340 Read SQL query into a DataFrame.
    341 
   (...)
    394 parameter will be converted to UTC.
    395 """
    396 pandas_sql = pandasSQL_builder(con)
--> 397 return pandas_sql.read_query(
    398     sql,
    399     index_col=index_col,
    400     params=params,
    401     coerce_float=coerce_float,
    402     parse_dates=parse_dates,
    403     chunksize=chunksize,
    404     dtype=dtype,
    405 )

File /usr/local/lib/python3.9/dist-packages/pandas/io/sql.py:1560, in SQLDatabase.read_query(self, sql, index_col, coerce_float, parse_dates, params, chunksize, dtype)
   1512 """
   1513 Read SQL query into a DataFrame.
   1514 
   (...)
   1556 
   1557 """
   1558 args = _convert_params(sql, params)
-> 1560 result = self.execute(*args)
   1561 columns = result.keys()
   1563 if chunksize is not None:

File /usr/local/lib/python3.9/dist-packages/pandas/io/sql.py:1405, in SQLDatabase.execute(self, *args, **kwargs)
   1403 def execute(self, *args, **kwargs):
   1404     """Simple passthrough to SQLAlchemy connectable"""
-> 1405     return self.connectable.execution_options().execute(*args, **kwargs)

File /usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py:1412, in Connection.execute(self, statement, parameters, execution_options)
   1410     meth = statement._execute_on_connection
   1411 except AttributeError as err:
-> 1412     raise exc.ObjectNotExecutableError(statement) from err
   1413 else:
   1414     return meth(
   1415         self,
   1416         distilled_parameters,
   1417         execution_options or NO_OPTIONS,
   1418     )

ObjectNotExecutableError: Not an executable object: "select id FROM statistics_meta \n                                                where statistic_id like 'sensor.xx_52_total_lifetime_energy_output';"

I tried to debug the code, but could not find anything strangeā€¦ Can you maybe help me out in the right direction? ( print(con) gives positive result ā†’ seems connected to the database ).

Thanks in advance,

1 Like

It looks like the same problem I recently had. Apparently the pandas package does not work nicely with SQLAlchemy 2. I solved that by forcing an older version of SQLAlchemy to install in the init_commands of the jupyterlab addon (pip install "SQLAlchemy<2").

Since the latest HA release 2023.4 changing entity id keeps history data. This should help when moving to a new entity! Rename your old entity id to the new name, remove the old integration, install the new integration.

Yes, I was very happy to read that. Next time it will be a lot easier to keep my history.

Can you explain a little more please?
I have this sensor (sensor.powcasath_energy_total) that isint updated anymore, but i like to have the stats in energy dashboard and now i use a shelly (sensor.shellyem_channel_1_energy).

how can i merge them?
If i rename ā€œsensor.powcasath_energy_totalā€ to ā€œsensor.shellyem_channel_1_energyā€ will it merge them?

Interested in this too

I donā€™t think that merging entities is possible with the approach described above. The approach above can be use to continue an old statistic with a new device / name.

Hmm, so now what? How to merge entities? Is it there still no official way?

1 Like

Merge is indeed maybe the wrong word, the script was meant to append the entries of the old entity_id to the new entity_id. However, with the newer versions of HomeAssistant this is not needed anymore as HomeAssitant handle renaming entities in the statistics database now. I also donā€™t know if the script still works since the database format has been updated recently.

Renaming is indeed an option, but how to rename an old entity to already an existing entity?

1 Like

As mentioned, I donā€™t think it is possible to merge two statistics, you can only continue with the standard home assistant tools. Merging is only possible via the database itsself as mentioned above, but without knowing what you are doing, I would not recommend this.

2 Likes

Well yes, not recommended for the faint of heart lol
Just sucks having several unused sensors in the energy dashboard (because we upgrade/change devices) and i donā€™t want to lose that information, witch is one of the reasons we use HomeAssistant, keeping track of data.

1 Like

@theneweinstein, Iā€™m getting this error:

sqlite3.IntegrityError: UNIQUE constraint failed: statistics.metadata_id, statistics.start_ts

It makes sense not to duplicate start_ts but how to implement it?

Itā€™s a bit hairy (my first nested SQL statement) but it works for me.

    stmnt = f"""UPDATE {table}                                                                                 
                 SET metadata_id = {target_sensor_id}                                                          
                 WHERE metadata_id = {source_sensor_id} AND start_ts NOT IN                                    
                     (SELECT s.start_ts FROM                                                                   
                         (SELECT start_ts FROM {table}                                                         
                             WHERE metadata_id = {source_sensor_id}) s                                         
                         INNER JOIN                                                                            
                         (SELECT start_ts FROM {table}                                                         
                             WHERE metadata_id = {target_sensor_id}) t ON s.start_ts = t.start_ts);            
             """                                                                                   

The idea is to check for duplicated start_ts rows and ignore them in UPDATE.