After a lot of research, I managed to implement an integration to import long term statistics from a file like csv or tsv (I did not find any existing integration doing this).
This custom integration offers a service with some parameters, the most important one being the file name containing the historical data.
In this way, you can easily import historical statistic data into a running HA instance, without the need to be familiar with SQL.
Am I correct that there is no protection against double imports, currently?
So, if you call the service twice on the same file, youāll have duplicate entries in your statistics?
Or is it somehow avoided with the import failing due to duplicate keys?
If you import a second time, you just overwrite the values which are already in the database. That does not do any harm, and it does not change any result. So you can also use this import to correct values in the database as well (OK, export is maybe missing).
I am not using SQL-statements, but the methods async_add_external_statistics / async_import_statistics of the recorder, and they can handle this.
BTW, I just realized that the container-based HASS has a different path to the config dir than the HASS OS on the Raspi, This is fixed now with version v0.1.2.
Thanks!
As I am not writing directly to sqlite, but using the recorder-API, that should work for MariaDB as well. But I cannot test this, I only used it with sqlite.
Great idea, Iāve been looking for a solution like this for a while - thanks!
Unfortunately, I canāt get it to work. I donāt see the service and when I check the configuration, I get this error message:
Configuration warnings
Integration error: import_statistics - Requirements for import_statistics not found: [āpandas==2.2.0ā]."
I tried to install pandas, but I didnāt know how it worked with HAOS.
Just to be sure: Did you restart hacore, or your device, after installting my integration?
Which version of HA are you using? I state in my requirements that I need at least 2023.8.0, which was maybe a bad idea, as I tested only with 2023.12.0.
You should not need to install pandas yourself. It is defined as a requirement in the manifest.json of the integration, and as far as I understood, this should be installed automatically.
I developed the integration on windows in a container, and installed it via HACS on my raspi, and I did not need to install pandas myself, it just worked.
I would suggest, restart hacore and check the logs.
Does anybody know where the required python modules are installed? The developer doc states it is āinto the deps subdirectory of the Home Assistant configuration directoryā, but this directory is empty on my raspi, although I have some custom integration with requirements, including mine.
Core: 224.2.2
Supervisor 2024.01.1
I have HAOS in a VM running on a Proxmox cluster. I installed your integration via HACS as a ācustom repositoryā, made the entry in the āconfiguration.yamlā and then restarted HA. Since then Iāve been getting this error message when I click on āCheck configurationā.
Home Assistant Core
Setup failed for custom integration 'import_statistics': Requirements for import_statistics not found: ['pandas==2.2.0']. 10:48:09 ā (FEHLER) setup.py
I have now removed the integration and reinstalled it. HA now writes relevant information to the log.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
2024-02-19 12:23:17.708 ERROR (SyncWorker_62) [homeassistant.util.package] Unable to install package pandas==2.2.0: ERROR: Cannot install pandas==2.2.0 because these package versions have conflicting dependencies.
So it looks like there is a dependency problem. I donāt know how to fix this yet, but hopefully the link in the log will help.
According to the āDependency Resolution - pip documentation v24.1.dev0ā it may be that the dependency in your package is set too strictly. Iām not sure whether the integration requires 2.2.0, which would be the current version, or whether I have 2.2.0 on the system and the integration requires an older version.
Maybe you can shed some light on this?
Update:
As a test, I now have the entry "ārequirementsā: [ āpandas==2.2.0ā ],
deleted from the āmanifest.jsonā (without knowing what I am doing ) and now the integration starts and I can also use the import service. I donāt know yet whether it works.
I updated to HA 2024.2.2, and with this update I had the same problem as well. It seems HA 2024.2.2 requires some older version of pandas, and I insisted on 2.2.0. Bad luck ā¦
I changed the requirements to āpandas>=2.0.0ā, this solved the problem. Yes, I have set the dependency too strictly.
I also realized that HACS does not show the versions of my integration correctly, but just the commit-hash (which is for sure my fault). I will have a look into this as well.
Please tell us here if the integration is working - thanks!
Iām thrilled - thanks for this very useful integration!
I have now managed it, after the integration made me aware of many errors (mostly 6 instead of 5 values), I was now able to import a TSV file with over 5000 temperature values.
For 5 days Iāve been sitting in front of the computer for countless hours trying to get the most important data that I extracted from a defective sqlite DB into the new Postgresql. I tried a lot of tools, to do this directly in the DB. With your integration this is now very easy.
I can also confirm that it works with Postgresql or Sqlite.
In my case I donāt have the state in the CSV file, only the sum. I can of course calculate the state based on the sum, but there will be rounding errors. But nothing I can do with that.
It also looks like I donāt have the start time, but the end time. So I have to subtract one hour from the time.
I saw in the documentation that the long term statistics store the last state within an hour. So when the CSV file contain the time stamp 13:00 and a state value, is this for the hour between 12:00 and 13:00 or between 13:00 and 14:00?
Interesting problem, but probably not directly related to the integration: As a test, I have imported the pool temperature since July 2023, which can then be accessed in the interface after 2 minutes. When I look again after 5 minutes, all the data is gone. Could be a problem with Postgresql, unfortunately there is nothing in the logging.
I managed to import several years with energy usage data. I was struggling a little bit to get the transition between the imported data and the previous logged data to align properly with the sum value. Had to look up the value in the database and use excel to calculate new historical values.
My mistake! The fact that the data was deleted again was caused by incorrect file format.
Iāve already imported a few things and the integration works perfectly.
Thanks again for the great work Klaus!