InfluxDB information (database size) as sensor in Home Assistant

Hi,

as many others (searched the forum, few requests with different information but all without a solution) I struggle to successfully define an InfluxDB sensor to be used in Home Assistant.

I want to get the current InfluxDB database size of homeassistant .

My query is:

SELECT sum("diskBytes") FROM "_internal".."tsm1_filestore" WHERE time >= now() - 7d AND "database"='homeassistant' GROUP BY time(30s), "database"

What does the sensor definition in my configuration.yaml need to look like?

I really don´t get the very very VERY complicated way of seggregating the whole query in different parts as needed, I read https://www.home-assistant.io/integrations/influxdb#sensor three times and tried and failed many times.

As mentioned I don´t want to get information coming from the Home Assistant database but from the InfluxDB storage engine itself.

ANY help appreciated!

1 Like

You put it in the sensors section of your config. Like this:

It does not work well. It only gives an approximation of the size and often reports zero.

Thank you for the right syntax, I tried that one. Well I acutally get a sensor value, but:

Two things in mind:

  1. Is “monitor.shard” really the right value? I found that the “monitor.tsm1_filestore” gives the right numbers. Have a look at https://stackoverflow.com/questions/52428018/how-to-get-influxdb-measurement-size/56776417#56776417

2) Is looking at the “_internal” database the right way?
I have a dedicated database “homeassistant”:
grafik

I think the better approach will be to create a dashboard of that filesize query, visualize it with Grafana and embedd that in Home Assistant. Don´t know nothing bout Grafana yet, I guess for a nice graph it would be sufficient but to get the actual database size as a number to work with in Home Assistant it might not be sufficient.

Also took a look into the folder sensore approach - also does not work because of the docker environment on the HASS OS. And I believe a InfluxDB database is spread over several files (as I can see from the HA snaphots) so it would be not quite accurate.

According to the post you linked to that’s not the actual size on disk either. Both methods (yours and mine) are approximations.

Alright I found following official information:

### shard

The `shard` measurement statistics are related to working with shards in InfluxDB OSS and InfluxDB Enterprise.

#### diskBytes

The size, in bytes, of the shard, including the size of the data directory and the WAL directory.

https://docs.influxdata.com/platform/monitoring/influxdata-platform/tools/measurements-internal/

### tsm1_filestore

The tsm1_filestore measurement statistics are related to the usage of the TSM file store.

#### diskBytes

The size, in bytes, of disk usage by the TSM file store.

https://docs.influxdata.com/platform/monitoring/influxdata-platform/tools/measurements-internal/

So I have no idea what shards are or when tsm is used - probably you´re absolutely right, there´s no “correct” source for that information.

My second question is kinda wrong: you´re not looking for the database “_internal” but the retention policy “_internal”, that´s a difference. So:

  • using your definition gives the whole size used by all databases (_internal, chronograf, homeassistant in my case)
  • which is fine if InfluxDB is exclusively running for Home Assistant and there are no other use cases/databases on that Influx DB instance

Otherwise (looking at ONE specific database) I think we´d need to specify this in the where: clause by adding something like AND "database"='homeassistant'.

Ah ok. I’m only after a reliable disk size of all the databases.

Me too. So my sensor definition currently looks like:

sensor:
  - platform: influxdb
    host: a0d7b954-influxdb
    port: 8086
    username: !secret influxdb_user
    password: !secret influxdb_pass
    scan_interval: 3600
    queries:
      - name: InfluxDB DB size
        unit_of_measurement: MB
        value_template: "{{ (value|float / 1024 /1024)|round(1) }}"
        group_function: sum
        measurement: '"monitor"."shard"'
        database: _internal
        where: "time > now() - 10s"
        field: diskBytes

Additionally I excluded this sensor in my general InfluxDB integration part to not have redundant information in the InfluxDB (same information two times with different names - nope, I don´t need that :-))

influxdb:
  host: a0d7b954-influxdb
  port: 8086
  database: !secret influxdb_daba
  username: !secret influxdb_user
  password: !secret influxdb_pass
  max_retries: 5
  default_measurement: state
  exclude:
    entities:
      - sensor.influxdb_db_size

Too bad all those information only give an estimation. Well, at least the differences (growing of the database) can be monitored/used in HA.

1 Like

Alright I double checked the information. The information (value) reported from the sensor definition is very strange. I have absolutely no idea where that value comes from cause it does not match the results given when performing the same SELECT statement on the InfluxDB Admin page.

  • Sensor gives 54,9 MB
  • Actual file size is 185 MB [*]

So I can´t use that at all. Tried everything, tested a lot, wasted more than 2 hours on this.

Finally decided to simply not use that purely WRONG piece of information for anything. Therefore I removed that sensor from my .yaml file.

Just a preview on the shard table which gives just some kind of garbage. File sizes should increase constantly and not vary that much, that graph is not useful at all.:

[*] To manually get actual filesize information without using the folder integration:

  • Made a snapshot, restored the InfluxDB-addon data to my client and checked the size of the data folder - Same can be achieved accessing the influxdb docker container I think, in @tom_l linked thread is the correct “du -sh” command.

Whatever it may help:

It gives me:
image

No clue if correct…

1 Like

I don’t think so. Although it’s one of the easiest way to get at least any InfluxDB database size estimation… still looking for an acurate data source easy to get.

1 Like

No MQTT. Not using it, not going to start for one sensor. :slight_smile:

In the previous comments in that topic is a version using the the SSH command to write to a file which is then read by a file sensor.

I’m having issues getting accurate data from the sensor. On disk my HA DB is about 120MB according to du -h. Portable full backup is 14MB. Yet in HA the sensor reports 1387.0 MB. What’s going on?

sensor:
  #InfluxDB Database Sensor
  - platform: influxdb
    host: 10.10.10.10
    port: 8086
    username: homeassistant
    password: !secret influxdb_password
    scan_interval: 3600
    queries:
      - name: InfluxDB Database Size
        unit_of_measurement: MB
        value_template: "{{ (value | float(0) / 1024 /1024) | round(1) }}"
        group_function: sum
        measurement: '"monitor"."shard"'
        database: _internal
        where: '"database"=''home_assistant'' AND time > now() - 5m'
        field: diskBytes

2023-04-06_14-41-24

1 Like
1 Like

I finally managed to realize this (obviously the need was not that high looking at the amount of time since the OP :smile: ).

Basically when using HA OS there needs to be ANY workaround to make this work reliably. The best one with fewest downsides I could find and tend to recommend is using the community SSH addon as tool/information source. Two implementation options:

There are a few little things to look at (e. g. Unreliable InfluxDB size sensor - #57 by e-raser) but it seems to work quite good. Information flow in generel is:

  1. SSH addon
  2. :arrow_right: file/MQTT broker
  3. :arrow_right::arrow_right: reading from 2 and creating a sensor

A lot of useful information on those two options (and other implementations/possibilities) are in that already recommended topic.