Import old energy readings for use in energy dashboard

made a WTH entry, suggest all vote to get this on the radar

With already 86 votes I have good hopes this will be picked up for the month of What the heck :smiley:

Only issue is that this one is in the wrong forum. Can one of the admins move this to wth? @frenck @petro ?

1 Like

This is a feature request, not a WTH. It will not be moved.

1 Like

Hey there,

do you know, what is the problem in my case?
I did an import of sql-data to MariaDB, The data seem to look similar to the original from home-assistant.
But I canā€™t figure out, why the external data is shown as a cumulative chart and if it was measured and calculated from home-assistant, it shows the actual energy of that time of the day.
I just imported the external data with corresponding dates for start and created, and then updated the metadata_id of that entriesā€¦
Home assistant changes the cumulative sum of new entries to a lower value, as soon as it gets additional data

Finally I got it working, when I realised the sum-column is not the total cumulative sum, but cumulates only the delta-values from the states-column.

So now I managed to import all fronius-energy-data from the last 3 years into the statistics table by SQL inserts.
I just couldnā€™t find the relevant information in the documentation.
What a challengeā€¦

Hi Hello

I accidently lost all data after updating my mariadb docker image.
Currently looking into a option how to add historical data for my solarpanels and powerconsumption

Reading thru this thread and not sure how you did this Rober Reiter.
I have dbeaver connection to the database, not sure yet how to insert for example the history data per day/month for the sensors

Could you give a example?

Here is the essence of the python-code I used to import the values. Iā€™m storing my data in mariaDB. But a similar procedure may be possible with other database types.
Note, that I didnā€™t manage yet, to also calculate the cost and compose values data. Iā€™m not sure yet, what would be the simplest way to get these values.
My data from fronius is organized like showed below. I see the amount of power created or used within 5 minute time intervals.

#Advice: Stop the home assistant core, before you finally change the sql-data.
#Otherwise you may get some strange entries of current values
#Further, I set my purge-time to 1, so that I have minimal amount of entries in the short_term table

# Connect to MariaDB Platform1 - target table (Verify addon-port 3306 is enabled in the config on home-assistant)
# Module Imports
import mariadb
import sys
from datetime import datetime, timedelta
import pandas as pd


try:
    conn2 = mariadb.connect(
        user="<DBuser>",
        password="<DBpassword>",
        host="<serverip-address>",
        port=3306,
        database="homeassistant_db"

    )
except mariadb.Error as e:
    print(f"Error connecting to MariaDB Platform: {e}")
    sys.exit(1)

# Get Cursor for SQL-queries
cur_t = conn2.cursor()



#update current entries in home-assistant database to use another metadata_id - so no entries are missing, if needed later
#don't forget to do this for all relevant sensors/metadata_ids that you want to manipulate
query="""Update statistics SET metadata_id = %s WHERE metadata_id = %s"""
metadata_id_curr=107
metadata_id_new=88107
record=(metadata_id_new, metadata_id_curr)
cur_t.execute(query,record)
conn2.commit()

#same for short_term table
query="""Update statistics_short_term SET metadata_id = %s WHERE metadata_id = %s"""
metadata_id_curr=107
metadata_id_new=88107
record=(metadata_id_new, metadata_id_curr)
cur_t.execute(query,record)
conn2.commit()




df_set=pd.read_excel(r"<pathToData>\2020_2022_PV_Daten.xlsx", parse_dates=[0],  index_col=0,skiprows=[1] )

#Resample 5-minute measurement-values to hourly samples - sum up the values
#Sensors for consumed and produced energy are transformed into a cumulative sum, to simulate values of a meter reader
#Inverter data is transformed to values with a daily reset

energy_real_consumed_fronius_meter=pd.DataFrame()
df=df_set.resample('1H')['Energie vom Netz bezogen'].sum()
energy_real_consumed_fronius_meter['states']=df.cumsum()

energy_real_produced_fronius_meter=pd.DataFrame()
df=df_set.resample('1H')['Energie ins Netz eingespeist'].sum()
energy_real_produced_fronius_meter['states']=df.cumsum()

energy_day_fronius_inverter=pd.DataFrame()
df=df_set.resample('1H')['Energie | Symo 6.0-3-M (1)'].sum()
energy_day_fronius_inverter['states']=df.groupby(df.index.date).cumsum()



#The offset is set, to simulate a meter reader start at a certain start-time - e.g. mine was at 1940.26141 a few years ago
#Just calculate the current state - sum of produced or consumed power
offset=1940.26141

prevState=offset
prevSum=0

#counter for amount of values that have been imported
count=0
#metadata_id of the corresponding sensor in the statistics_meta table
q_metadata_id=105
query="""INSERT INTO statistics (created,start,state,sum,metadata_id) VALUES (%s,%s, %s, %s,%s)"""

for i in energy_real_produced_fronius_meter.index:
    #create datetime is set to start datetime + 1h
    #start datetime is set to table-entries -2h, due to timezone-difference. I think home assistant internally always uses GMT+0 in the database
    #q_state is the total_increasing value of the meter reader
    q_created=(pd.to_datetime(i, unit='us')+timedelta(hours=-1,seconds=+1 )).strftime('%Y-%m-%d %H:%M:%S')
    q_start=(pd.to_datetime(i, unit='us')+timedelta(hours=-2)).strftime('%Y-%m-%d %H:%M:%S')
    q_state=offset+energy_real_produced_fronius_meter['states'][i]/1000
    
    #calculate delta of state to previous state for total increasing sum
    if q_state-prevState>=0:
        q_sum=prevSum+q_state-prevState
        prevState=q_state
        prevSum=q_sum
    else:
        q_sum=prevSum
        prevState=q_state
    
    
    record=(q_created,q_start,q_state,q_sum,q_metadata_id)
    #fire query
    cur_t.execute(query,record)
    conn2.commit()
    count=count+1
print(str(count)+ " datasets inserted" )


#Offset for second meter reader
offset=3818.66521
prevState=offset
prevSum=0
#...similar code for the second total increasing meter...



#for the daily reset values
deltaState=0
prevState=0
prevSum=0
from datetime import datetime, timedelta
count=0
q_metadata_id=96
query="""INSERT INTO statistics (created,start,state,sum,metadata_id) VALUES (%s,%s, %s, %s,%s)"""
for i in energy_day_fronius_inverter.index:
    q_created=(pd.to_datetime(i, unit='us')+timedelta(hours=-1,seconds=+1 )).strftime('%Y-%m-%d %H:%M:%S')
    q_start=(pd.to_datetime(i, unit='us')+timedelta(hours=-2)).strftime('%Y-%m-%d %H:%M:%S')
    #notice, that the state has no offset here, the first entry starts with zero
    q_state=round(energy_day_fronius_inverter['states'][i]/1000,3)
    
    
    if q_state-prevState>=0:
        q_sum=prevSum+q_state-prevState
        prevState=q_state
        prevSum=q_sum
    else:
        q_sum=prevSum
        prevState=q_state
        
    
    record=(q_created,q_start,q_state,q_sum,q_metadata_id)
    cur_t.execute(query,record)
    conn2.commit()
    count=count+1
print(str(count)+ " datasets inserted" )


#finally, do some stuff with the short term statistics, that may be needed
#You can manually change the sum-values using phpMyAdmin before you change the needed values back. (Not more than 12 entries per sensor, if you set the purge time to 1)
#Or you may decide to forget them forever - then you can safely ignore this and miss 1 day of data.
query="""Update statistics_short_term SET metadata_id = %s WHERE metadata_id = %s"""
metadata_id_curr=88107
metadata_id_new=107
record=(metadata_id_new, metadata_id_curr)
cur_t.execute(query,record)
conn2.commit()
1 Like

This is awesome

I try first with my solarpanel historical data.
I Only have day statistics and not per 5 minutes.

See what i can do to adjust the provided code .
thank you

I only have the data, because I subscribed for the free 14day trial professional account from fronius Solarweb. They have all the data from my inverter. I hope that Iā€™ll never run into a similar situation. Otherwise I would also get just the daily reports.

Hi, my MariaDB database got stuck when upgrading a couple months ago to version 2022.5. The schema update crashed the DB, coudnā€™t get it to work. Started over in a new DB installation.

I was able to restore my old DB, tried another update to the new schema and this succeeded. Now I have two databases, one with energy data starting 04-2021 to 05-2022 and a new database with data starting 05-2022 untill now. I would like to merge this data, only the energy data is enough.

Do you have an idea to make this merge happen?

Thanks a lot for the help.

Greetings from NL!

PS. +1 for importing historic data!!

This FR should be higher up.

I had several issues with the Energy records. On 2 different occasions, during HA updates I lost 3 months and after almost other 6 months of consumption data.

My electricity distributor, in Spain, allows you to download a yearly CSV file with breakdown hourly consumption, which would be great that I could import into HA.

On the other hand, thereā€™s the need to ensure that no data is lost during updates (Iā€™ve changed now to MariaDB), but it is something that needs to be addressed properly, since we need to have a reliable system to compare data and analyze trends.

finally at the point to import the data. however im getting a constraint error when updating the existing values.

Update statistics SET metadata_id = '8876' WHERE metadata_id = '76'

SQL Error [1452] [23000]: (conn=96) Cannot add or update a child row: a foreign key constraint fails (home_assistant.statistics, CONSTRAINT statistics_ibfk_1 FOREIGN KEY (metadata_id) REFERENCES statistics_meta (id) ON DELETE CASCADE)

Yes i could remove this constraint and apply updates, inserts and then add it back. How are other people experiences with this?

If you are updating existing entries, you have to set the ā€œignoreā€ modifier in order to prevent from stopping the import process
Just type ā€œIGNOREā€ between ā€œupdateā€ and ā€œstatisticsā€

https://dev.mysql.com/doc/refman/8.0/en/update.html

  • With the IGNORE modifier, the update statement does not abort even if errors occur during the update. Rows for which duplicate-key conflicts occur on a unique key value are not updated. Rows updated to values that would cause data conversion errors are updated to the closest valid values instead. For more information, see The Effect of IGNORE on Statement Execution.
1 Like

Do I understand it correctly, that with the async_insert_statistics, I have to know the cumulative value and can only use hourly values?
The issue I have is, that I get the energy consumption in Wh for a 15min interval but not the cumulative meter reading for that timestamp. Thus, the only thing I can do is to access the last statistics entry and the sum up to that point and then have to add values? Thus, if I miss a value in between, there is no way to synchronize?

1 Like

If you have 15 mins samples you can do a resampling using python. Eg. Use Jupiter Notebook

Like so:

energy_real_consumed_fronius_meter=pd.DataFrame()
df=df_set.resample('1H')['Energie vom Netz bezogen'].sum()
energy_real_consumed_fronius_meter['states']=df.cumsum()

that is an implementation detail :smiley: So, am I right that I need hourly, cumulative data for the statistics? i.e., there is no way to somehow import only a single value (=energy consumption) for a time period into the statistics?

Or in an applied example: If I get the following data:

2022-11-22 00:00 --> 200Wh
2022-11-22 01:00 --> 150Wh
2022-11-22 02:00 --> 300Wh

I need first get the last sum from the statistics (lets say it was 1234Wh on 2022-11-21 23:00) to insert the following statistics (in pseudocode):

StatisticMetaData(has_sum=True, has_mean=False, unit_of_measurement=Wh...)
StatisticData(start=2022-11-22 00:00, sum=1434, state=200)
StatisticData(start=2022-11-22 01:00, sum=1584, state=150)
StatisticData(start=2022-11-22 02:00, sum=1884, state=300)

correct?
Or can I say has_sum=False and only insert the state?

ā€œhas_sumā€ is either True, if you look at an increasing sensor like a energy meter.
Or it is False, if you focus on a measurement sensor. Then ā€œhas_meanā€ is usually True.

You are right, that you need hourly datasets, though I never tried if the system would accept 15 min. data as well. The hourly measures are used to save disk space and probably to improve performance for the sum values.

And yes, you are right, that you first need the last sum. Otherwise you will get some kind of a peak on that date entry.

However, you can also change single entries directly in home assistant:
Development tools ā†’ Statistics ā†’ Modify sum
I donā€™t know if this is the correct translation, so here is a screenshot:

1 Like

okay perfect, I got that working!
The only thing I do not understand yet is, how I can set the inital value properly?
I.e., I can query the cummulative value for example of last week - say it is 500kWh. Then I can query all the consumptions from that point on. But how do I set that properly? If I set the first sum to be 500kWh + the consumption of the first hour after that, I get a huge spike in the statistics.
Thus, is there a way to set the statistics to start with the 500kWh, without creating a huge spike?

I think 500 kWh IS a huge spike.
Either you start your counting value letā€™s say in June with 500 kWh, to count the consumed power from January to June on one single day for the statistics. Then you see all the consumed power for the whole year. Or you start counting at zero. Then you are missing the 500 kWh for the previous year.

If you start counting with a new total_increasing sensor it will start with cumulative sum zero and the value of the increasing counter. Eg 1234
This shows no spike if all following values fit between counter and cumulative sum.

And if youā€™re looking at the last week, you probably have to compare what is set in the statistics short term table. This is added to your current value. This is the reason why I proposed to set the purge time to 1 day

okay, I see. So I have to set the value of the sensor itself and then import the statistics and then never touch the value of the sensor itself? Because otherwise I would overwrite the sum I have set in the statistics again, right?