Custom Integration to import long term statistics from a file like csv or tsv

After a lot of research, I managed to implement an integration to import long term statistics from a file like csv or tsv (I did not find any existing integration doing this).

This custom integration offers a service with some parameters, the most important one being the file name containing the historical data.

In this way, you can easily import historical statistic data into a running HA instance, without the need to be familiar with SQL.

You can find it here: GitHub - klausj1/homeassistant-statistics: This HA integration allows to import long term statistics from a file like csv or tsv.

Have fun, and please let me know if it works for you!

BTW, it would be also great to get a kind of review of the source code - its quite small :blush:

21 Likes

Cool!!

Am I correct that there is no protection against double imports, currently?
So, if you call the service twice on the same file, youā€™ll have duplicate entries in your statistics?

Or is it somehow avoided with the import failing due to duplicate keys?

If you import a second time, you just overwrite the values which are already in the database. That does not do any harm, and it does not change any result. So you can also use this import to correct values in the database as well (OK, export is maybe missing).

I am not using SQL-statements, but the methods async_add_external_statistics / async_import_statistics of the recorder, and they can handle this.

BTW, I just realized that the container-based HASS has a different path to the config dir than the HASS OS on the Raspi, This is fixed now with version v0.1.2.

Nice work! Judging by the code it should also work for MariaDB installations. Is that correct?

Thanks!
As I am not writing directly to sqlite, but using the recorder-API, that should work for MariaDB as well. But I cannot test this, I only used it with sqlite.

Great idea, Iā€™ve been looking for a solution like this for a while - thanks!

Unfortunately, I canā€™t get it to work. I donā€™t see the service and when I check the configuration, I get this error message:
Configuration warnings
Integration error: import_statistics - Requirements for import_statistics not found: [ā€˜pandas==2.2.0ā€™]."

I tried to install pandas, but I didnā€™t know how it worked with HAOS.

Thats strange ā€¦

Just to be sure: Did you restart hacore, or your device, after installting my integration?

Which version of HA are you using? I state in my requirements that I need at least 2023.8.0, which was maybe a bad idea, as I tested only with 2023.12.0.

You should not need to install pandas yourself. It is defined as a requirement in the manifest.json of the integration, and as far as I understood, this should be installed automatically.

I developed the integration on windows in a container, and installed it via HACS on my raspi, and I did not need to install pandas myself, it just worked.

I would suggest, restart hacore and check the logs.

Does anybody know where the required python modules are installed? The developer doc states it is ā€œinto the deps subdirectory of the Home Assistant configuration directoryā€, but this directory is empty on my raspi, although I have some custom integration with requirements, including mine.

Core: 224.2.2
Supervisor 2024.01.1
I have HAOS in a VM running on a Proxmox cluster. I installed your integration via HACS as a ā€œcustom repositoryā€, made the entry in the ā€œconfiguration.yamlā€ and then restarted HA. Since then Iā€™ve been getting this error message when I click on ā€œCheck configurationā€.

When I restart I get the same error message:

Home Assistant Core

Setup failed for custom integration 'import_statistics': Requirements for import_statistics not found: ['pandas==2.2.0']. 10:48:09 ā€“ (FEHLER) setup.py

I have now removed the integration and reinstalled it. HA now writes relevant information to the log.

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
2024-02-19 12:23:17.708 ERROR (SyncWorker_62) [homeassistant.util.package] Unable to install package pandas==2.2.0: ERROR: Cannot install pandas==2.2.0 because these package versions have conflicting dependencies.

So it looks like there is a dependency problem. I donā€™t know how to fix this yet, but hopefully the link in the log will help.

According to the ā€œDependency Resolution - pip documentation v24.1.dev0ā€ it may be that the dependency in your package is set too strictly. Iā€™m not sure whether the integration requires 2.2.0, which would be the current version, or whether I have 2.2.0 on the system and the integration requires an older version.
Maybe you can shed some light on this?

Update:
As a test, I now have the entry "ā€œrequirementsā€: [ ā€œpandas==2.2.0ā€ ],
deleted from the ā€œmanifest.jsonā€ (without knowing what I am doing :wink:) and now the integration starts and I can also use the import service. I donā€™t know yet whether it works.

Thanks, that helped a lot.

I updated to HA 2024.2.2, and with this update I had the same problem as well. It seems HA 2024.2.2 requires some older version of pandas, and I insisted on 2.2.0. Bad luck ā€¦

I changed the requirements to ā€œpandas>=2.0.0ā€, this solved the problem. Yes, I have set the dependency too strictly.

I also realized that HACS does not show the versions of my integration correctly, but just the commit-hash (which is for sure my fault). I will have a look into this as well.

Please tell us here if the integration is working - thanks!

Iā€™m thrilled - thanks for this very useful integration!

I have now managed it, after the integration made me aware of many errors (mostly 6 instead of 5 values), I was now able to import a TSV file with over 5000 temperature values.

For 5 days Iā€™ve been sitting in front of the computer for countless hours trying to get the most important data that I extracted from a defective sqlite DB into the new Postgresql. I tried a lot of tools, to do this directly in the DB. With your integration this is now very easy.
I can also confirm that it works with Postgresql or Sqlite.

Thanks again!

In the file homeassistant-statistics/assets/state_sum.tsv at d774af291f251948243fce6470a33483142c4495 Ā· klausj1/homeassistant-statistics Ā· GitHub , what is ā€œsumā€?

Great to ā€œhearā€ :blush:, thanks for the feedback

1 Like

This is for counter-like sensors, if the value is reset. See e.g. New sensor state class: total_increasing | Home Assistant Developer Docs. If state is enough for you, you should be able to just set sum to 0. Or to the same value as state

Thanks

In my case I donā€™t have the state in the CSV file, only the sum. I can of course calculate the state based on the sum, but there will be rounding errors. But nothing I can do with that.

It also looks like I donā€™t have the start time, but the end time. So I have to subtract one hour from the time.

I saw in the documentation that the long term statistics store the last state within an hour. So when the CSV file contain the time stamp 13:00 and a state value, is this for the hour between 12:00 and 13:00 or between 13:00 and 14:00?

Interesting problem, but probably not directly related to the integration: As a test, I have imported the pool temperature since July 2023, which can then be accessed in the interface after 2 minutes. When I look again after 5 minutes, all the data is gone. Could be a problem with Postgresql, unfortunately there is nothing in the logging.

I managed to import several years with energy usage data. I was struggling a little bit to get the transition between the imported data and the previous logged data to align properly with the sum value. Had to look up the value in the database and use excel to calculate new historical values.

My mistake! The fact that the data was deleted again was caused by incorrect file format.
Iā€™ve already imported a few things and the integration works perfectly.
Thanks again for the great work Klaus!

This is cool! I am going to try this for my solar data. Will report back here!