InfluxDB is BIG... how to manage it?

It would be great if there was some kind of simple way for handling database entries in Home Assistant. For example, some setting per entity, where you could set if you want to archive the entity’s state or value, and for how long you will want to keep it (or forever).

I switched some time ago to MariaDB + InfluxDB combination, and I have set up Influxdb so that only sensor and calendar domains are included. Still, my MariaDB is about 5,6 GB and InfluxDB database is almost 27 gigabytes! InfluxDB is growing daily by 200 MB. I have set it up so that data will stay forever (I want to keep my temperature data for example), but to me it seems that database will grow and grow until something breaks :confused:

1 Like

Same here.

But the solution is to include instead of exclude…

But “changing” that is somewhat…

Too lazy to read all the replies after seeing so many bad recommendations. The key is to a) try to limit what you send to Influx to reasonable data, and b) to post-process the influx data to down-sample useful information that you want to retain long-term. To do this you create tasks, and they are going to need to be specialized depending on what the data is-- some counters increment daily and you might want to get hourly or quarter-hourly breakdowns rather than semi real-time. You might want deltas between windows or integrals. For some things you might just care about daily min/max/mean. Influx is really where you have the tools to pare data.

Pro tip: put your pared data in a separate bucket from your base data, and set a shorter retention period for base data, and potentially “forever” retention for pared values. You can also go three levels deep in buckets and your pared processes.

Is there any step-by-step guides out there to do what Patrlki suggested, if there is no other way? I find it odd that there is not some kind of system where you could just list all entity types in influxdb, select one or several entities and then just delete all entries of that type from the database :confused:

Seems like I wrote a bit over two months ago that my influxdb database size is almost 27 gigabytes. Well, today it’s almost 47 gigabytes so it’s clear that sooner or later my Home Assistant will get corrupted somehow, or at least my SSD will be full! So I have to do something fast…

I cannot understand why it is growing that fast. This is my setup:

  include:
    domains:
      - calendar
      - sensor
  exclude:
    domains:
      - automation
      - binary_sensor
      - button
      - camera
      - cover
      - device_tracker
      - light
      - media_player
      - number
      - scene
      - switch
      - update

Same Problem here, over 50gb data from half a year…

Have you used data explorer to take a look at just how much stuff is stored? I have a weather station with about 16 entities; each reading is 4 or more records, including sql-style syntax for the units, value, icon, and device class, recorded every 30 seconds.

So, for me, I need to set a retention period for “Hassbucket” of maybe a month, and filter content down to logical aggregate windows. But, the first step is to set entity_globs for your include and exclude to try to filter out information you will never have use for in the future.

I’ll post back my task here once I get it done and checked.

I have looked at InfluxDB Explorer (in HA plugin) and I find it very difficult to understand and use, at least for someone that has experience only with SQL and relation databases. I know I have tons of unnecessary data in there, although I have included basically only sensors, but I have surveillance cameras, weather station etc that bring a lot of extra stuff.

To me the most logical place to handle history of our entities would be in Home Assistant itself. There is a list of all entities you have, so there could be some sort of option to set the time about how long they will stay in history: maybe two weeks for every entity as a default, and then you could change that time individually for every entity (most important entity records you could set to keep a year, two years or forever, for example). Maybe one day, until then we have to sort these out ourselves :confused:

1 Like

The reason to use an external historian is so you can go beyond the limits of SQLite for storage; time series databases are great for the ability to window fairly natively and intuitively. IMO, much of that advantage is lost if you are using it as a Hass add-on though because of resource limitations generally speaking.

Start by going to “Data Explorer” on Infux and filter for one specific sensor and look at 5-minutes of data in “raw data” mode. It helps to get a sense of just how much information HomeAssistant is storing.

The main tool for consolidating data is aggregateWindow, which can down-sample data to a longer time duration-- mean, max, min, integral, etc. If Hass is exporting information at 30 second intervals, after a few days only 5-minute intervals might be meaningful. With scheduled tasks you can export the information to a separate bucket that might keep the 5-minute (or 15-minute or whatever makes sense for a particular sensor), and with pivots you can do min/max/mean for the interval.

It took me a few months of playing with Influx to not feel completely lost; their community forum has staff that are helpful, but often you need domain-specific help to get better information.

(My data reduction task is still a work-in-progress. My first attempt ended up with an unrefined feel so I need to work a little harder to figure out how to properly manage information like rainfall data.)

Reducing sensor update frequencies will help as well. I.e. temperature sensor once per minute instead of every few seconds. Some integrations allow to select refresh rates. Don’t make them higher then practically required.
Last but not least: for template sensors, specific triggers can be used to limit the update rates (otherwise determined by every change of entities used in the template)

I’m also wondering how to deal with big amount of sensor data from HomeAssistant stored in InfluxDB.

I’m wondering if a first step shouldn’t be to provide an analysis with several queries to know :

  1. list of HA sensors whom values are stored to InfluxDB.
  2. number of data points per sensor
  3. first data time per sensor
  4. last data time per sensor
  5. average data points per hour for a given sensor (or average sampling period)

Unfortunately I’m quite new with InfluxDB and I’m missing answer to first question… getting list of sensors whom data are stored to InfluxDB.

Some people are pointing to InfluxDB doc about delete… Delete data | InfluxDB Cloud (TSM) Documentation well … it doesn’t help me much as I don’t know where I can run influx delete command. I tried in core-ssh Terminal but I get simply command not found

Hope that helps

It should be all in my guide above. To look what entities are stored you can use the webinterface of influxdb (i think it was in the explore option, i don’t use influxdb anymore). There you can check every measurement with for example this select statement:

select * from homeassistant.autogen."%" where time > '2022-04-22' and time < '2022-04-24'

But deleting from there did not work for me. To access the influxdb container you need the addon with elevated access, it’ called advanced ssh & web terminal:

Don’t forget to disable protection mode in the settings. In this you can log into your influxdb container:

docker exec -it addon_a0d7b954_influxdb influx -precision rfc3339

then auth yourself if necessary:

> auth
username: homeassistant
password:
> use homeassistant
Using database homeassistant

and finally you can use the delete command

delete from "%" where entity_id = 'your_sensor'
2 Likes

How to Get a Count of Records per Entity from InfluxDB Using Grafana

If you’re trying to understand the frequency of readings for different entities in InfluxDB, here’s a method I found useful using Grafana.

Steps:

  1. Create a Chart in Grafana: Start by setting up a new chart visualization.
  2. Add a Query for Each Measurement: For your first measurement, use the following query:
SELECT count("value") FROM "autogen"."state" WHERE $timeFilter GROUP BY "entity_id"::tag

image

  1. Duplicate the Query for Other Measurements: Click on the “Duplicate query” icon and simply change “state” to the next value from the dropdown list. Repeat this for each measurement you’re interested in.

This approach gave me a clear overview of which entities had the most frequent readings. It’s a bit manual, but it got the job done!

If anyone knows a more efficient way to achieve this please share!

1 Like

how I can see sensor size in HA?

add to your configuration.yaml

sensor:
  - platform: influxdb
    host: a0d7b954-influxdb
    port: 8086
    username: !secret influx_username
    password: !secret influx_password
    queries:
      - name: InfluxDb DB Size
        unit_of_measurement: MB
        value_template: "{{ (value | float / 1024 /1024) | round(1) }}"
        group_function: sum
        measurement: '"monitor"."shard"'
        database: _internal
        where: "time > now() - 10s"
        field: diskBytes
1 Like

I ended up in my own question. I am now starting a try to get this done… But in my head it was more easy than in real life… The first try is there:

I think I am too big :slight_smile:

I use this, it’s a lot of work but it does it…

Hello,
I’ve tried to use the InfluxDBStudio, but I do not manage to find out which address to enter in the connection setting for my Home assistant InfluxDb (with HASSIO).
Any suggestion ?
Thanks

Put the ip adress of you HA and port 8086 in?

This was my understanding, so I tried it without success… up to now.
But for some reason it does work today !!
That’s perfect.
Thanks a lot.

1 Like