State Data to InfluxDB

Hello fellow automation geeks!

I needed a solution for my very large Home Assistant database of 27 GB (growing by the minute).
After a quick search I decided to go with InfluxDB and Grafana.
Unfortunately, Home Assistant doesn’t have a built-in way to migrate historical data to InfluxDB. Since I couldn’t find any recent discussions or solutions for this, I teamed up with ChatGPT to create a Python script that copies Home Assistant historical data from an SQLite database to an InfluxDB bucket.
Tested with Home Assistant Core 2024.10.1 and InfluxDB v2.7.10.

I run both HA, Influx and Grafana in separate Dockers.

You can check it out on GitHub:

My next step is to create a nice Grafana Dashboard.
Comments or hits are welcome!

For setting up InfluxDB and Grafana with HA, I used this reddit post for reference:
https://www.reddit.com/r/homeassistant/comments/oje65v/home_assitant_with_grafana_and_influxdb_2x_basic/

2 Likes

Hello eldigo,

Put this string into your search engine…

site:home-assistant.io migrate old data to influx

It’s amazing what a small search will do,

Like I’d said, no recent solution that works with the new ha database changes early 2023.

If you have the schema from that search, can you do something? Tye it together with the process in other posts maybe?
Long- and short-term statistics | Home Assistant Data Science Portal.

Not sure what you pointing out, but my script already works with the new database schema.
It insert values into influxdb nicely.

I’m just mentioning it on the forum so people can find it :slightly_smiling_face:

Edit: note that I had a 600 days retention in HA
The real reason why my database is that large.

do this script extract only short term data or also long term historical data?

No only short term.
Haven’t figured out to get all the attributes you have in short term states table, on the the long term statistics table.

If this doesn’t match up, it’s almost pointless to load the long term stats as they won’t “connect” to the live data that is inserted by HA.

Hi Eldigo,

I’ve been using influxdb for more than a year now. Could I use your script to import only the data that aren’t in influx db yet? Does your script check the existing data in influx, or it duplicates them?

Thx in advance.

The script looks at the earliest date in influx and gets data from HA earlier than that. So no duplicate data.

Note that it only gets data from states table in HA not historical tables.

Thanks @eldigo for Your work, it works like a charm also with core-2024.12.1 and a sqlite database of 16542.29 MiB and of course with influxDB 2.x

Best Regards Handfest

Thank you for your work and sharing it @eldigo.
I have installed and configured influxdb and homeassistant and had run your script to migrate all data from sqlite to influxdb.

I have set a source of HomeAssistant for the influxdb integration instead of HA.

After the migration i have two sets of sensor data for outside temperature, one with source HA another with source HomeAssistant.
(No problem with that, will correct it later and restart.)

But i have detected an overlap between the two graphs in grafana.

Do you have an idea how to correct it when i rerun the script?
(green= source HA)

The overlap starts at 15:24 and ends at 18:32, so does not seem to be UTC to local time difference which is 2h.

But when i look at the temperature at start and find one where it could append that seems to be 2h. I think a have a problem with the time setting somewhere.

@eldigo
I had a conversation with Gemini about my finding and Gemini came up with a code modification to your script which i have adopted and tested.
It seems to remedy my problem.
The timestamp from the sqlite database is converted to UTC before inserting into influxdb.
I could attach the modified code for inspection and adoption?

Hi @juergenjw , do you have a GitHub repo with the changes?

Hi, i think i have it here: https://github.com/jurviz/ha-sqllite-2-influxdb.git

Here is some code that might be missing:

        if unit_of_measurement == '':
            unit_of_measurement = 'count'
        try:
            ## Convert timestamp from Unix epoch to datetime object
            #last_updated_dt = datetime.fromtimestamp(float(last_updated_ts))

            # Convert timestamp from Unix epoch to datetime object (local time)
            local_dt = datetime.fromtimestamp(float(last_updated_ts))
            
            # Make the datetime object timezone-aware for your local timezone (CEST)
            local_tz = pytz.timezone('Europe/Berlin') # Use the correct timezone for your HA
            local_dt_aware = local_tz.localize(local_dt)
            
            # Convert to UTC
            utc_dt = local_dt_aware.astimezone(pytz.utc)

            # Create an InfluxDB point with tags and fields
            point = Point(unit_of_measurement).tag("source", "HA").tag("domain", domain)
            point.tag("entity_id", entity_id_short).tag("friendly_name", friendly_name).time(utc_dt)