Inspired by this Thread https://community.home-assistant.io/t/influxdb-1-8-setup-one-continuous-query-for-whole-database/292242/4 , where i find the idea really good but it did not work for me.
EDIT:
The threadstarter of above thread also replied and said it’s working. I cannot double check this since i don’t use influxdb anymore. For don’t having all the hassle with cronjobs or anything else, you might want to try that first. If this works for you, it would be the cleanest way and the way it should be done. Just 2 continous queries and influx does everything for you. If it had worked for me i would have done it also this way, but i could not find my failure.
If it also doesn’t work for you, read on
The problem seems so be, that the continuous queries mentioned there cannot have a where time clause, which makes it useless for this case (InfluxQL Continuous Queries | InfluxDB OSS v1 Documentation, from my researches).
So i decided to find another way. Maybe it helps someone who has the same goal.
I want to do the following. Data in InfluxDB older than 6 months but newer than 2 years should be compressed to one entry per 5 minutes. Data older than 2 years is compressed to one entry per 15 minutes. Data younger than 6 months is the raw data you save in your InfluxDB.
BEFORE DOING ANYTHING IN YOUR DATABASE - MAKE A BACKUP. I’M NOT RESPONSIBLE FOR ANY DATA LOSS.
Thread for Backing up the Database:
https://community.home-assistant.io/t/howto-migrate-influxdb-data-to-another-server-on-hassio/164405
So let’s start. The Database i used in InfluxDB is homeassistant. I am running Home Assistant Supervised on a NUC on Debian Bullseye, for this kind of installation this tutorial is. Should be nearly the same on a raspberry pi. On some parts you have to edit your Username and your Password.
I assume you already have a database and want it to be converted to the above.
Login to your Linux Machine:
ssh your_ip_of_your_machine
Login to your InfluxDB Container:
docker exec -it addon_a0d7b954_influxdb influx -precision rfc3339 -username 'influxdb_username' -password 'influxdb_password' -database 'homeassistant'
Here create 2 retention policies:
CREATE RETENTION POLICY "y2_5m" ON "homeassistant" DURATION 104w2d REPLICATION 1
CREATE RETENTION POLICY "inf_15m" ON "homeassistant" DURATION INF REPLICATION 1
Now copy and group the data from your database:
This copies all database entries older than 6 months to newer than 24 months to the retention policy “y2_5m” grouping them to 5 minutes per entry. Data older than 24 months is grouped by 15 minutes into the policy “inf_15m”
SELECT mean(value) AS value,mean(crit) AS crit,mean(warn) AS warn INTO "homeassistant"."y2_5m".:MEASUREMENT FROM "homeassistant"."autogen"./.*/ WHERE time < now() -26w and time > now() -104w GROUP BY time(5m),*
SELECT mean(value) AS value,mean(crit) AS crit,mean(warn) AS warn INTO "homeassistant"."inf_15m".:MEASUREMENT FROM "homeassistant"."autogen"./.*/ WHERE time < now() -104w GROUP BY time(15m),*
Since the continuous query didn’t work at least for me i am doing this with a shell script.
Create the script:
nano influx_query.sh
Put the following content into it:
Every time it runs it will copy all data older than 6 months to older than 6 months and 6 hours to the retention policy “y2_5m” and grouping them again by 5 minutes, and data older than 24 months to 24 months and 6 hours to the policy “inf_15m”, grouping them by 15 minutes.
#!/bin/sh
docker exec -t addon_a0d7b954_influxdb influx -execute 'SELECT mean(value) AS value,mean(crit) AS crit,mean(warn) AS warn INTO "homeassistant"."y2_5m".:MEASUREMENT FROM "homeassistant"."autogen"./.*/ WHERE time < now() -26w and time > now() -26w6h GROUP BY time(5m),*' -precision rfc3339 -username 'your_username' -password 'your_password' -database 'homeassistant'
docker exec -t addon_a0d7b954_influxdb influx -execute 'SELECT mean(value) AS value,mean(crit) AS crit,mean(warn) AS warn INTO "homeassistant"."inf_15m".:MEASUREMENT FROM "homeassistant"."y2_5m"./.*/ WHERE time < now() -104w and time > now() - 104w6h GROUP BY time(15m),*' -precision rfc3339 -username 'your_username' -password 'your_password' -database 'homeassistant'
To execute this once per hour for example use a cronjob:
run:
crontab -e
and in the last line insert:
* */1 * * * /home/yourusername/influx_query.sh >/dev/null 2>&1
if you want a logfile replace ‘>> /home/yourusername/log_influx_query.txt 2>&1’ with ‘>/dev/null 2>&1’
If you don’t have access to the host, Eric wrote an addendum how to setup this without crontab (see post 14):
When you have finished this you should have a retention policy “y2_5m” which holds data from 6-24 months with an aggregation of 5 minutes, a policy “inf_15m” which has the data older than 24 months (and infinite) grouped by 15 minutes. The raw data you have in your autogen policy which is not altered in this moment.
This example works for me and hopefully for you. I have the times a little bit overlapping for not to loose data if the machine is not working for a short time.
When you double checked all is working fine, you can set the retention policy for autogen to 6 months and 1 day or something like this, so the raw data is deleted after that period.