Merge MariaDB dump-data from two different database sets

roberreiter · October 3, 2022, 6:57pm

Hello,

a few months ago, I set up a new version of my home assistant MariaDB-database, as a migration from MariaDB 5 to 10 failed on my NAS. I logged a lot of data in the meantime.
Then I figured out, that my NAS consumes a lot of power and besides is very slow, so I decided to move my database to an external storage onto my home assistant running on a Raspberry Pi 4.
I created an sql-dump of the database, moved it to the Raspberry Pi using scp and did an import of the database. After a restart of homeassistant with the new database-settings, the logs showed me, that it is going to reorganize the database. Everything looked nice, but I couldn’t find any data until I saw, that I accidentally imported the old database set from the MariaDB 5. All the logged statistics-data is still there.

Now I wonder, if I could import the new database entries from MariaDB10 to the currently used database and merge the datasets together. Mainly I would be interested on the energy-data, KNX temperature logs and heat pump data from ESPHome, that are all stored int the long time statistics tables within home assistant.

However, due to the fresh creation of my MariaDB10 database, the id-keys have changed. Also because in the mean-time, some sensors were renamed, removed or added.

Is there any chance to keep all the data and merge them together, or should I forget either the MariaDB10 or the MariaDB5 data?

roberreiter · October 5, 2022, 8:28pm

OK,
I finally managed to reach some success. Though there are some residual problems.

If anyone is interested, here is my solution:
I wrote a short python script that I ran within Jupyter Notebook and uses the MariaDB plugin for python (pip install mariadb).
The script reads the Sensor-IDs of the source table and the IDs of the target table.
Then it selects all the data from the statistics source-table with sensors which are present within both tables and translates them into the target table.

# Module Imports
import mariadb
import sys

# Connect to MariaDB Platform1 - source table - verify port 3307 is used or change port-number in script to default 3306
try:
    conn = mariadb.connect(
        user="user",
        password="password",
        host="<sql server ip-address>",
        port=3306,
        database="homeassistant_db"

    )
except mariadb.Error as e:
    print(f"Error connecting to MariaDB Platform: {e}")
    sys.exit(1)

# Get Cursor
cur_s = conn.cursor()
# Get Cursor2
cur_s2 = conn.cursor()

# Connect to MariaDB Platform2 - target table (Verify addon-port 3306 is enabled in the config on home-assistant)

try:
    conn2 = mariadb.connect(
        user="username",
        password="password",
        host="<server-ip>",
        port=3306,
        database="homeassistant_db"

    )
except mariadb.Error as e:
    print(f"Error connecting to MariaDB Platform: {e}")
    sys.exit(1)

# Get Cursor
cur_t = conn2.cursor()


#Get meta-data ids from target-table
t_id_dict={}
cur_t.execute(
    "SELECT * FROM statistics_meta")
for (t_id, t_statistic_id,t_source,t_unit_of_measurement,t_has_mean,t_has_sum,t_name) in cur_t:
    t_id_dict[t_statistic_id]=[t_id, t_statistic_id,t_source,t_unit_of_measurement,t_has_mean,t_has_sum,t_name]

#Migrate
count=0
cur_s.execute(
    "SELECT * FROM statistics_meta")
for (s_id, s_statistic_id,s_source,s_unit_of_measurement,s_has_mean,s_has_sum,s_name) in cur_s:
    if(s_statistic_id in t_id_dict):
        #print("id: "+str(s_id)+" " + s_statistic_id+"--> "+str(t_id_dict[s_statistic_id][0])+"\n\t")
        cur_s2.execute(
            "SELECT * from statistics where metadata_id like "+str(s_id))
        #for (s2_id,s2_created,s2_start,s2_mean,s2_min,s2_max,s2_last_reset,s2_state,s2_sum,s2_metadata_id) in cur_s2:
        #    print("\t"+str(s2_id)+"\t"+str(s2_created))
        for (q_id,q_created,q_start,q_mean,q_min,q_max,q_last_reset,q_state,q_sum,q_metadata_id) in cur_s2:
            query="""INSERT ignore INTO statistics (created,start,mean,min,max,last_reset,state,sum,metadata_id) VALUES (%s,%s, %s, %s, %s, %s, %s, %s, %s)"""
            record=(q_created,q_start,q_mean,q_min,q_max,q_last_reset,q_state,q_sum, t_id_dict[s_statistic_id][0])
            cur_t.execute(query,record)
            #print(record)
            conn2.commit()
            count=count+1

            
    else:
        continue
        
        
        
print(count)

It looks, as all data is available now.
However, there is still one problem:
If I look at the energy-card I see one entry from September and one from May that shows huge energy outliers. But I don’t know how to delete the entries from the database.
If I look for the values of the sensors, there doesn’t seem to be any outlier within.

roberreiter · October 12, 2022, 8:49pm

Hi,
I now tried to import data I have from fronius.
I calculated the daily cumulative sum for the state value and the total cumulative sum for the sum value in the statistics table.
This is exactly, how the data looks like, when I view the original dataset
Here is an image of the original dataset

And this is the dataset I created by my own in python.

But when I finally look at the energy dashboard, The energy earnings don’t look similar, as if they were created by home assistant:
Original

With my dataset-table

I don’t get any clue, what is different.
The only thing that changed really, is the total cumulative sum, as this is now starting from 2020 instead of 2022

lollozzo · January 14, 2023, 11:54pm

i fixed it using this query on long and short term statistics tables, my graph seems normal after data fix by query.