InfluxDB is BIG... how to manage it?

My InfluxDB has gone big…

I have a sensor that says 181.4Mb, but that’s ridiculous not correct :slight_smile:
This is the template:

  - platform: influxdb
    host: a0d7b954-influxdb
    port: 8086
    username: !secret influxdb_user
    password: !secret influxdb_pass
    scan_interval: 3600
    queries:
      - name: InfluxDB DB size
        unit_of_measurement: MB
        value_template: '{{ (value | float / 1024 /1024) | round(1) }}'
        group_function: mean
        measurement: '"monitor"."shard"'
        database: _internal
        where: 'time > now() - 1m AND "database"=''home_assistant'''
        field: diskBytes

I created a single bacup of the influxdb addon anthat is around 8Gb.

I wonder how I can cleanup the DB witout breaking stuff. The retention policy is now:
image

I do not want to loose any data.

Is there a wimple way to:

  1. determine which entities haven’t update for over ~X amount of months and thus can be safely deleted from the database?
  2. Determine which entities are not present in homeassistant anymore? (related to 1).

Any other thoughts to clean up?

THX!

Pfft. :face_with_hand_over_mouth:

I explicitly include sensors I’m actually interested in graphing long term data for and my InfluxDB is 2.6GB and growing. I estimate it will top out at about 3 to 4GB as 2 years data retention will be up in another few months but I’ve been adding sensors as I add devices and come up with more things to graph.

This topic details my search for a reliable InfluxDB size sensor, and the hoops I had to jump through to achieve it https://community.home-assistant.io/t/unreliable-influxdb-size-sensor/226871

Hi Tom, thanks for responding. I do not care about the sensor anymore ATM… since it’s too complicated to get (for me).

I hope I get some info on cleaning up my DB…

1 Like

Only include sensors you are interested in graphing long term data for.

Thanks, but that is not a solution for my current situation…

1 Like

data in the timeserie-influxdb is not stored as “device” entities(doesn’t work as a relation db), so you would have to spend endless time finding the “data” you want to delete, it’s more like stored with time-stamp/measurement/ID , where a given ID( ID is “applied” from influxdb when it gets a new “source” from it’s sensor(source=HA-entity-name … i.e when you “rename” an entity in HA, influx will add this as new “source” and give it a new ID, start over from scratch) so, basically if you want to delete from influx, you have to define a “timespan” from where you want to delete, and add key-field etc.etc. … Retention= delete when to old, so basically, if you had some devices from last year, their data is gone (thou not in your case as you have no retention) , the universe is said to be infinity, database-size as-well(if you have enough diskspace), retention-policies and “selection-policies” are the means to keep a database in shape … so what you really want is a relation-db(i.e Maria-DB, where you easily can delete , you can use “entities/devices” … you can use this as source for Grafana, but you loose the performance of the time-serie-DB … selection/retention is your best choice to keep database to it’s “minimum”, unless you will sit and “pick” every 6month or so(in a relation-DB)

I get the logics. Thanks for that. But I need the actions on how to.

https://docs.influxdata.com/influxdb/cloud/write-data/delete-data/

@boheme61 thanks for your time. I can find that myself as well. I need some real guidance, no liks to documentation. Please, although I appreciate your replies I think I can do without it for now to keep this post open for others.

ok, good luck

2 Likes

The documentation is the official guide.

People doing this: https://community.home-assistant.io/t/influxdb-removing-or-deleting-data/292637 are just following that documentation.

Guide and guidance are different words and have different meaning :yum:.

So many times pointed to documentation and guides. Finding a manual of a boeing 747 does not mean I can operate it or repair it. But when watching e.g. a youtube I think I can do some level of maintenance after. See what I mean? If I would understand how to do it based on guides I would not be asking for help here…

2 Likes

did you get a solution? facing the same problems…

I started with these 2 tutorials:

https://dummylabs.com/post/2019-01-13-influxdb-part1/
https://dummylabs.com/posts/2019-05-28-influxdb-part2/

(Found them somewhere here in a post)

To clean up your database is not a short way, i had a look in every measurement what entities are stored and which one i really needed. In the last years i had a include all and exclude some sensors, this approach is really bad if you want to store your data for a long time.

To look into your database you can for example see what entities are stored in one measurement with:

select * from homeassistant.autogen."%" where time > '2022-04-22' and time < '2022-04-24'

Paste this into explore the influxdb addon. Then you see all entities that are stored yesterday with the “%” measurement.
To look what measurements you have use SHOW MEASUREMENTS into the explore query.
A similar query for °C would be:

select * from homeassistant.autogen."°C" where time > '2022-04-22' and time < '2022-04-24'

When you have entities you want get rid off delete them. This was a bit tricky for me because the delete from did not work in explore in influxdb (i did not find the right syntax). So i logged into the container (see the dummylabs tutorial part 2)
ssh onto your host, execute:

user@host:~$ docker exec -it addon_a0d7b954_influxdb influx -precision rfc3339
Connected to http://localhost:8086 version 1.7.2
InfluxDB shell version: 1.7.2
Enter an InfluxQL query

and then

> auth
username: homeassistant
password:
> use homeassistant
Using database homeassistant

Then you can delete data with:

delete from "%" where entity_id = 'your_sensor'

So you have to go through every measurement, check what is stored and delete everything you don’t need. Don’t forget to change your influxdb config to only a include strategy of the sensors you really need.

To change how much data is stored i wrote a little tutorial which compresses data after 6 months and after 2 years in different retention policies to save space. For me cleaning up changed my db size from 4.5GB to around 350MB for 3 years of recording. For example i do not need the ink levels of a printer that left me before 2 years :wink:

Guidance enough?

4 Likes

I do not know where I found it. But I downloaded and installed windows influx db reader and that helped me to manually delete all “old” data. Of entities that are long time gone already… saved me around 2gb of data totally. Not that much compared to the total remaining and still too big to backup with HA backup …

Do you have a name for the Windows based influx db reader? It may help me and others with the problems you faced. . … Thanks

It was literally the 1st hit in google.

It would be great if there was some kind of simple way for handling database entries in Home Assistant. For example, some setting per entity, where you could set if you want to archive the entity’s state or value, and for how long you will want to keep it (or forever).

I switched some time ago to MariaDB + InfluxDB combination, and I have set up Influxdb so that only sensor and calendar domains are included. Still, my MariaDB is about 5,6 GB and InfluxDB database is almost 27 gigabytes! InfluxDB is growing daily by 200 MB. I have set it up so that data will stay forever (I want to keep my temperature data for example), but to me it seems that database will grow and grow until something breaks :confused:

1 Like

Same here.

But the solution is to include instead of exclude…

But “changing” that is somewhat…

Too lazy to read all the replies after seeing so many bad recommendations. The key is to a) try to limit what you send to Influx to reasonable data, and b) to post-process the influx data to down-sample useful information that you want to retain long-term. To do this you create tasks, and they are going to need to be specialized depending on what the data is-- some counters increment daily and you might want to get hourly or quarter-hourly breakdowns rather than semi real-time. You might want deltas between windows or integrals. For some things you might just care about daily min/max/mean. Influx is really where you have the tools to pare data.

Pro tip: put your pared data in a separate bucket from your base data, and set a shorter retention period for base data, and potentially “forever” retention for pared values. You can also go three levels deep in buckets and your pared processes.