InfluxDB - Setup to Compress Data older than 6 Months / 2 Years

erkr · May 2, 2023, 6:45pm

Thanks for looking deeper into it. Glad you can confirm my issue. I didn’t have time myself to further investigate. Will report if I get any further

dingausmwald · May 9, 2023, 5:39pm

Oh dear, all that hassle. I am the guy from the initial post, and i can confirm that it works. In my thread i see you try alot of additional stuff like you start your cont. queries with “2y_5m-inf_15min” CREATE CONTINUOUS QUERY and so on. You name your query databases “2y_5m-inf_15min”, but later your refer to them as “FROM homeassistant.“2y_5m”./.*/ WHERE time < now()”

give it another try and take a look at the logs/error messages the influxdb query gives back.

At the end it should look like this

dMopp · August 7, 2023, 11:02pm

Why not using

CREATE CONTINUOUS QUERY "y2_5m" ON "homeassistant" BEGIN SELECT mean(value) AS value,mean(crit) AS crit,mean(warn) AS warn INTO "homeassistant"."y2_5m".:MEASUREMENT FROM "homeassistant"."autogen"./.*/ WHERE time < now() -26w and time > now() -26w6h GROUP BY time(5m),* END
CREATE CONTINUOUS QUERY "inf_15m" ON "homeassistant" BEGIN SELECT mean(value) AS value,mean(crit) AS crit,mean(warn) AS warn INTO "homeassistant"."inf_15m".:MEASUREMENT FROM "homeassistant"."autogen"./.*/ WHERE time < now() -104w GROUP BY time(15m),* END

? no crontab required anymore afaik

madface · August 8, 2023, 4:51pm

@dingausmwald
I do not work with influxdb anymore (using victoriametrics), so i cannot double check this anymore. I updated my first post for the others to use your method (and only to use the cronjob way if they fail like me), i know it is the cleanest way and my originally intention was to use a cq.

@dMopp
Give it a try, if this works for you there is no better choice. But unfortunately this didn’t work for me, so i made this workaround.
I gave up as i read the time clause is ignored in the continous query (InfluxQL Continuous Queries | InfluxDB OSS 1.8 Documentation)

juskalalie · September 1, 2023, 9:33am

Hey @madface,
I tried these CQ and can confirm that the WHERE is not taken into account. I took a look at VictoriaMetrics but it seems that the downsampling function is not available (only in the Enterprise version), so I am curious to know what benefits do you get from VictoriaMetrics instead of InfluxDB?
I really think downsampling is a must for long storage, not only regarding DB size, but also for “cleaning up” the data after a while…
Thanks!

madface · September 2, 2023, 12:49pm

I have read a lot about VM before i switched, and it should have a smaller footprint in ressources, is faster and i am always willing to try something new, that was my intention to switch.
I agree with the downsampling, my hope is that the team behind VM considers somewhere in the future that this feature should be also in the community edition or there is a way to do this manually from time to time.
But the database footprint of VM is really small. At now i have 337 time series that are stored (i have a include only strategy for long term and store only what i think it could be usefull) and my db grows ca. 3MB a month:

So in 10 years it would be about 400MB, nothing i would worry about and who knows what will be the next hot thing to use then

(FyI, i switched about the beginning of 2023 and the import of influxdb was about ~20MB)

juskalalie · September 4, 2023, 7:41am

Thanks a lot for your answer! Can I ask how did you migrate your data from influxdb into VM?

madface · September 4, 2023, 3:36pm

There is a tool from victoriametrics to migrate, called vmctl. You can download it from their github page.
If you have access to the host system (in my case i have a supervised install on debian) you can migrate all the data with a single command (assuming you have the influx and the vm add-on installed on the machine you run the command):

./vmctl-prod influx --influx-user homeassistant --influx-password mysupersecretpw --influx-database homeassistant

If you run from a different machine (for example you run the add-ons on HAOS and do not have access to the host system) you have to add the flags with the adresses of the databases.
When done you simply have to change your settings for the influx integration in your configuration.yaml to the new adress (it is described in the documentation of the addon how it have to look like).

For a quick test about the database size of influx and vm with your data, you could migrate the data and look how the size is in vm (nothing would change in your setup when doing this i think)
I use a shell command to show the size of vm data:

/usr/bin/du -s /usr/share/hassio/share/victoria-metrics-data | cut -f -1 > /usr/share/hassio/homeassistant/victoriametrics_db_size

Again i think you need access to the host system or at least ssh addon with deactivated protected mode, and you have to look how your pathes are to the vm data. You should find them in the share folder of home assistant.
And as always, please make A BACKUP before doing anything with databases .

juskalalie · September 7, 2023, 10:49am

Thanks a lot again for your help - I did try to import my data into VM but I must be missing something…

I don’t want to hijack this thread any longer as it’s about influxDB and not VM, so I posted my issue here:

pilot1981 · October 28, 2023, 8:25pm

where I can use retention policy? I can create a new influxdb dashboard and insert them?

madface · October 29, 2023, 10:07am

I did this via commandline logged in the container of influxdb. As far as i remember you can do this also with the webinterface of influxdb, both ways should do the same.

sender · November 11, 2024, 10:51am

I have a DB of 45Gb (after deleting a lot of entities with InfluxDBstudio it was 50Gb). Now I want to set the retention as discussed in this topic.

Would/could it be as simple as going to my influxdb addon in HAOS and click here on add retention policy:

And add 2 lines of you retention policy?
would that work instantly?

VanMak · November 14, 2024, 4:08am

My InfluxDB is currently at 15GB, I created the two retention policies and ran the subsequent commands to copy older data over. The second command didn’t do anything because my db is less than two years old. The first command ran for a long time, and gave the following output:

name: result
time                 written
----                 -------
1970-01-01T00:00:00Z 15167463

But the consumed disk space remains unchanged:

Is that 1970 time stamp a problem? Am I missing something else?

sender · November 14, 2024, 6:35am

Can someone please explain how to apply the retention policies? In influxdb addon…

Tried to ask in influxdb forum:
Running DB - cleanup - InfluxDB 1 - InfluxData Community Forums