Custom Integration to import long term statistics from a file like csv or tsv

So overall I was successful to import using a tsv file with this format. My failure: I used the wrong sensor name.

statistic_id unit start sum
sensor.stromzaehler_verbrauch kWh 11.12.2023 10:00 2540.0

However there is still one behaviour that I could not solve. When I import data the very next hour starts with a value of minus counter:

Historic data is accurate. Data following that hour as well but I do not know how to solve that, yet.

in one of my trials I had similar issue. it seems it is related to ‘sum’ column

@klaj mentions this in his readme

Initially I just left 0 in all rows in column ‘sum’, then tried to copy the ‘state’ value to the ‘sum’. in both cases I had similar spikes in the junction of imported data and newer data.

I then did a calculation:

  • first I looked up in the database what is the first original 'state; value in the HA ‘statistics’ table. (used phpmyadmin (I am using MariaDB))
  • I then used a google sheets formula to substract the state value from that first state. take a look at the pic.
  • it resulted in an increasing negative ‘sum’

by the way I also used googlesheets to format the date from the one HA exported (used ‘Replace-all’ to cut off the seconds and unnecessary characters).

oh and one more thing. the statistic_ID is different from the one that I exported from HA. in other words I truly merged two different entities history :grinning:

That fixed it.

Next job will be an export of all the data to InfluxDB. Current data is written by the sensor. :slight_smile:

Hello Klaus, all,

I finally got to try your integration. It took me a while to setup a second home assistant, with a copy of my database to safely mess with the measurements.
Thanks for this integration!

I got old data from my domoticz, going back to 2017. Especially the energy readings would be nice to transfer, but also some temperature measurements.

I set up a libreoffice calc sheet to transfer data to the right format, and for the electricity measurements it worked. But with the gas consumption measurement, which for all I can see, I did the same way the integration gives the following error:
(I couldn’t copy the line, so I made a screenshot.

From what I gather , it complains about the statistic_id being invalid. But I never use a colon.

What I am a bit worried and unsure about is the unit. It is cubic meter, and I copied the string from home assistant, but the superscript 3 can cause some trouble.
These are the first 2 lines from the file (badly copied, admitted)
|statistic_id|unit|start|state|sum|
|sensor.dsmr_consumption_gas_delivered |m³ |03.12.2017 00:00|31.782|-14420.305|

Any idea what goes wrong here?

Regards,

Bert

does your sensor.dsmr_consumption_gas_delivered entity exist in your system…that might be your issue as this Integration will only import to existing entities, it doesn’t create them.
(if you don’t have the sensor you want to import to already in your system just create it using a template that never updates and you can import against that)

Hi Gav_in,

Barring a typo, yes, the entity exists. It is part of the Digital Smart Meter Reader(dsmr) integration. That is the strange thing. And the electricity history input went without a hitch, so I thought I had it figured out.

Again, I’m a bit nervous about the m3 going right. typing in kWh is easy, but with the superscript “3” in m3, you never know between file encodings. But then the error is indicating something else of course.
In short, I’m at a loss.

Bert

Can you please try if it works without the space between …delivered and the ‘|’?

If the content of the file above is not correct (I saw in your screenshot that you are using a tab as delimiter, above is a pipe), and you do not have a wrong “space”, please attach the real file (the first two lines are enough, if the error occurs with the 2 rows only as well).

Klaus

Hi Klause,

Thanks for the catch. There was indeed a space character after “delivered”.
When I removed that, the import went OK. Thanks again for the great integration!
I still hope it can be made more userfriendly. E.g. if Homassistant gets a facility to recalculate the sum value, so we don’t have to struggle with that.
Also, is the unit column really needed? Or can you get that from the entity and fill it in during the import?

Anyways, still very happy and importing some more data from domoticz.

Kind regards,

Bert

Fine that it is working now :slight_smile: . The sum column is on my list already, added “take the unit from the existing entity”. Lets see if/when I have time … in the meantime, here is the list of possible improvements I have collected so far:

  • Fix bug with seconds in timestamp (only support minutes, and add tests for some “crazy” timestamp)
  • Calculate sum automatically, based on the oldest existing value
  • Allow different dateTime-formats
  • Take the unit from existing entity
  • Export data

@all: Feel free to implement, I’d be happy to review and merge :wink:

Hi Klaus,

Yes, I finished migrating my data. Your integration was very nicely timed for me. And appart from the typo with the space, on my side, it went smoothly. The integration worked flawlessly.

In the process I created an libreoffice calc sheet to support the migration of my Domoticz data to Home Assistant. Maybe this can help someone that is trying the same thing. GitHub - brjhaverkamp/Domoticz2HomeAssistant: Migrate data from Domoticz to Home Assistant
It is my first ever github repository, so bare with me :slight_smile:

Feedback is welcome, and I will try to include suggested improvements. Although this type of thing is something that you use only once, so I probably never use it myself again. But it can be a starting point for someone.

@Klaus, thanks again for your work. From my point, to add to the wish list is of a utility to upload your entire domoticz db, select the domoticz device, select the Home Assistant Entity, and some other data and migrate fully automatic. But I know this is a very elaborate request. So take it with the apppropriate amount of grains of salt.

Kind regards,

Bert

I’d like to ask for some clarification regarding the timestamp (start column) before I do my first import.

I exported data for the entire month of May for an existing sensor in Home Assistant, with the objective of importing that into a newly created template sensor (als in HA).

The timestamp in the exported data as seen in the exported data looks like this in the first couple of rows:

entity_id,state,last_changed
sensor.sensor_name,7043.864,2024-04-30T22:00:00.000Z
sensor.sensor_name,7043.877,2024-04-30T23:00:00.000Z
sensor.sensor_name,7043.89,2024-05-01T00:00:00.000Z
sensor.sensor_name,7043.904,2024-05-01T01:00:00.000Z
...

Technically these timestamps are correct, they’re noted in UTC and since I live in The Netherlands, UTC+1 and in Daylight Savings Time, the first ‘human’ hour in May starts on April 30 at 22:00 because we currently live 2 hours ahead of UTC.

I can easily turn that timestamp as Home Assistant exports it into the format required for this integration, using some search & replace work and a formula in a spreadsheet.

However, this sentence in the readme on Github is a little confusing to me:

Keep in mind that the times are local times of the HA server.

Does the integration automatically take care of the time zone offset of 2 hours (in my case) by specifying the timezone identifier in the import screen?

Or do I first have to alter the source data in the csv file so that the timestamp of the very first row show as 01.05.2024 00:00 rather than 30.04.2024 22:00?

my understanding is that it’s this route you need to go…

the line you quoted I think is telling you that the Service here is using your local HA time to import and any manipulation to the database time (UTC) is done by the Service/HA rather then by you…
(just to say that when I imported my data I was in UTC timezone so didn’t need to make any adjustments or considerations here so might be off in my thinking)

one of the nice things about this Service is that a reimport of the same data overwrites the existing database entries so you can fairly easily correct/adjust if you need to

Did you try it, did it work?

To be honest, I am not sure anymore what I meant with “Keep in mind that the times are local times of the HA server.” You have to specify the timezone anyhow when importing the data, so if you specify there the timezone where your HA server is running, and the times are in local time of this timezone, everything should be fine.

I haven’t gotten around to trying it yet, I’m planning to update my Home Assistant version this weekend, and to take advantage of that reboot to also get this integration installed. That way I can reduce how often I need to restart it, since that also interrupts any running automations and sensor processing.

Once I know how the timestamps react to timezone settings, I’ll report back.

Just to be sure, when the instructions are to add ‘import_statistics’ to the configuration.yaml file, do you literally mean to simply add a new line somewhere (perhaps near the top) of the file, that looks like:

import_statistics:

with nothing before, after, or under it?

Yeah, that’s it…drop import_statistics: into Configuration file, reboot and you’re good to go

Thanks for that hint, after the reboot the service appeared in the developer tools, and no errors are present in the system log.

Now I’m going to test the importing process and how the time zone values in the file and the time zone setting in the service work. Updates on that to follow…

I’ve been successfully importing long term statistics from energy sockets that I’m about to ‘retire’ into a newly created template sensor that I’ve created beforehand.

The new template sensor records the energy consumption of a new energy socket, but I wanted to retain a long term historical overview as well for what’s essentially a metric of the same device.

So being able to export the data for the old energy socket and then importing it into the new template sensor was exactly what I was looking for.

As far as the time zones are concerned, I decided to take the UTC values that the exported data had and turn those into new timestamps

Here is the first row of the exported CSV data in its original format:

entity_id,state,last_changed	
sensor.sensor_name,5578.458,2023-12-31T23:00:00.000Z	

I uploaded it into Google Drive, which turns it into a spreadsheet with three columns.

Due to the fact that I’m in UTC+1, that datestamp 2023-12-31T23:00:00.000Z is actually the very first hour of 2024.

I added 2 new columns between columns A and B for ‘unit’ and ‘start’. Next I added another new column between D and D, for ‘sum’. That also means the timestamp column ends up in column F.

In column F, replace the “T” with a space in all timestamps using Search & Replace, and also replace the Z with a “”. The timestamp now looks like “2023-12-31 23:00:00.000”

Now change the headings to

statistic_id unit start state sum last_changed

In column A (statistic_id), I did a search & replace to show the name of the new template sensor.

In column B (unit) I pasted the value “kWh” in all rows because this happens to be an energy sensor. Adjust accordingly for other data types.

In cell C2 (for start) I pasted this formula in all rows:

=TEXT(DATEVALUE(F2)+TIMEVALUE(F2)+(0.04166666666),"DD.MM.YYYY HH:MM")

This adds 1 hour to the value of the timestamp in column F and then formats it as the integration expects it. I then copied and pasted the formula to all rows below.

In E2 (for sum) I simply entered =D2 and then copied that to all the rows underneath.

Column F can be left in place, the integration will simply ignore that.

Now it looks like:

statistic_id  unit start state sum last_changed
sensor.template_sensor_name kWh 01.01.2024 00:00 5578.458 5578.458 2023-12-31 23:00:00

Next I use the file → download → as CSV to export the spreadsheet to a csv file that can be used in the integration.

I’ve got the Google Sheet specified as English (UK) so the column separator turns out as a , (comma) and the decimal separator as a . (period).

Hopefully this is useful for others who are trying to do something similar.

2 Likes

I have a follow up question, after successfully importing long term statistics into a bunch of regular template sensors.

I also have a couple of sensors created by the history_stats . They typically record the time a device has been running per day. I can export them using Home Assistant, and the CSV file looks just like those for a regular template sensor. Next, I prepare the data with the columns needed for this integration.

Then, when I go to import it using the integration, something odd happens:

  • I have the History section of Home Assistant open in another tab, and a little while after I’ve clicked the “Call service” button, I see the data appearing in the chart. So it all seems to work.
  • But when I refresh the chart a few minutes later, it is empty again.

For background information, I started with importing data for October 2023, the history_stats platform is currently updating today’s uptime data for June 2024. This is the way I’ve coded the history_stats for this particular metric in configuration.yaml:

  - platform: history_stats
    name: "Heatpump Running Today Time"
    unique_id: heatpump_running_today_time
    entity_id: binary_sensor.heatpump_running_now
    state: "on"
    type: time
    start: "{{ today_at() }}"
    end: "{{ now() }}"

I checked the system logs, there’s nothing noted that could have anything to do with this issue. I’ve also had a look at developer tools → statistics for this particular sensor, it doesn’t indicate any issues there either.

So my question is:

  • in theory, is this integration capable of importing data for a sensor that originates from history_stats at all?

or

  • is there perhaps something related to how history stats work, causing Home Assistant itself to wipe out the (historical) data for October 2023 that was just imported, at its next refresh (which it does automatically every minute)?

I do not know enough about history_stat sensors, I never used them.

Home Assistant keeps statistical data for 10 days, and longterm historical data forever afaik. The integration creates longterm historical data.

Are your orginal data from the history_stats available for longer than 10 days?

In the end, the integration just calls HA-methods to import the data:

            if metadata["source"] == "recorder":
                if check_entity_exists(hass, metadata["statistic_id"]):
                    async_import_statistics(hass, metadata, statistics)
            else:
                async_add_external_statistics(hass, metadata, statistics)

This is working if you see the values. If the are not visible afterwards anymore, then they are deleted by HA itself.

So the question is more, is it possible to call async_import_statistics / async_add_external_statistics for history_stat sensors so that they are not deleted afterwards, and I do not have enough HA-knowledge to answer this.