InfluxDB information (database size) as sensor in Home Assistant

Hi,

as many others (searched the forum, few requests with different information but all without a solution) I struggle to successfully define an InfluxDB sensor to be used in Home Assistant.

I want to get the current InfluxDB database size of homeassistant .

My query is:

SELECT sum("diskBytes") FROM "_internal".."tsm1_filestore" WHERE time >= now() - 7d AND "database"='homeassistant' GROUP BY time(30s), "database"

What does the sensor definition in my configuration.yaml need to look like?

I really donĀ“t get the very very VERY complicated way of seggregating the whole query in different parts as needed, I read https://www.home-assistant.io/integrations/influxdb#sensor three times and tried and failed many times.

As mentioned I donĀ“t want to get information coming from the Home Assistant database but from the InfluxDB storage engine itself.

ANY help appreciated!

1 Like

You put it in the sensors section of your config. Like this:

It does not work well. It only gives an approximation of the size and often reports zero.

Thank you for the right syntax, I tried that one. Well I acutally get a sensor value, but:

Two things in mind:

  1. Is ā€œmonitor.shardā€ really the right value? I found that the ā€œmonitor.tsm1_filestoreā€ gives the right numbers. Have a look at https://stackoverflow.com/questions/52428018/how-to-get-influxdb-measurement-size/56776417#56776417

2) Is looking at the ā€œ_internalā€ database the right way?
I have a dedicated database ā€œhomeassistantā€:
grafik

I think the better approach will be to create a dashboard of that filesize query, visualize it with Grafana and embedd that in Home Assistant. DonĀ“t know nothing bout Grafana yet, I guess for a nice graph it would be sufficient but to get the actual database size as a number to work with in Home Assistant it might not be sufficient.

Also took a look into the folder sensore approach - also does not work because of the docker environment on the HASS OS. And I believe a InfluxDB database is spread over several files (as I can see from the HA snaphots) so it would be not quite accurate.

According to the post you linked to thatā€™s not the actual size on disk either. Both methods (yours and mine) are approximations.

Alright I found following official information:

### shard

The `shard` measurement statistics are related to working with shards in InfluxDB OSS and InfluxDB Enterprise.

#### diskBytes

The size, in bytes, of the shard, including the size of the data directory and the WAL directory.

https://docs.influxdata.com/platform/monitoring/influxdata-platform/tools/measurements-internal/

### tsm1_filestore

The tsm1_filestore measurement statistics are related to the usage of the TSM file store.

#### diskBytes

The size, in bytes, of disk usage by the TSM file store.

https://docs.influxdata.com/platform/monitoring/influxdata-platform/tools/measurements-internal/

So I have no idea what shards are or when tsm is used - probably youĀ“re absolutely right, thereĀ“s no ā€œcorrectā€ source for that information.

My second question is kinda wrong: youĀ“re not looking for the database ā€œ_internalā€ but the retention policy ā€œ_internalā€, thatĀ“s a difference. So:

  • using your definition gives the whole size used by all databases (_internal, chronograf, homeassistant in my case)
  • which is fine if InfluxDB is exclusively running for Home Assistant and there are no other use cases/databases on that Influx DB instance

Otherwise (looking at ONE specific database) I think weĀ“d need to specify this in the where: clause by adding something like AND "database"='homeassistant'.

Ah ok. Iā€™m only after a reliable disk size of all the databases.

Me too. So my sensor definition currently looks like:

sensor:
  - platform: influxdb
    host: a0d7b954-influxdb
    port: 8086
    username: !secret influxdb_user
    password: !secret influxdb_pass
    scan_interval: 3600
    queries:
      - name: InfluxDB DB size
        unit_of_measurement: MB
        value_template: "{{ (value|float / 1024 /1024)|round(1) }}"
        group_function: sum
        measurement: '"monitor"."shard"'
        database: _internal
        where: "time > now() - 10s"
        field: diskBytes

Additionally I excluded this sensor in my general InfluxDB integration part to not have redundant information in the InfluxDB (same information two times with different names - nope, I donĀ“t need that :-))

influxdb:
  host: a0d7b954-influxdb
  port: 8086
  database: !secret influxdb_daba
  username: !secret influxdb_user
  password: !secret influxdb_pass
  max_retries: 5
  default_measurement: state
  exclude:
    entities:
      - sensor.influxdb_db_size

Too bad all those information only give an estimation. Well, at least the differences (growing of the database) can be monitored/used in HA.

1 Like

Alright I double checked the information. The information (value) reported from the sensor definition is very strange. I have absolutely no idea where that value comes from cause it does not match the results given when performing the same SELECT statement on the InfluxDB Admin page.

  • Sensor gives 54,9 MB
  • Actual file size is 185 MB [*]

So I canĀ“t use that at all. Tried everything, tested a lot, wasted more than 2 hours on this.

Finally decided to simply not use that purely WRONG piece of information for anything. Therefore I removed that sensor from my .yaml file.

Just a preview on the shard table which gives just some kind of garbage. File sizes should increase constantly and not vary that much, that graph is not useful at all.:

[*] To manually get actual filesize information without using the folder integration:

  • Made a snapshot, restored the InfluxDB-addon data to my client and checked the size of the data folder - Same can be achieved accessing the influxdb docker container I think, in @tom_l linked thread is the correct ā€œdu -shā€ command.

Whatever it may help:

It gives me:
image

No clue if correctā€¦

1 Like

I donā€™t think so. Although itā€™s one of the easiest way to get at least any InfluxDB database size estimationā€¦ still looking for an acurate data source easy to get.

1 Like

No MQTT. Not using it, not going to start for one sensor. :slight_smile:

In the previous comments in that topic is a version using the the SSH command to write to a file which is then read by a file sensor.

Iā€™m having issues getting accurate data from the sensor. On disk my HA DB is about 120MB according to du -h. Portable full backup is 14MB. Yet in HA the sensor reports 1387.0 MB. Whatā€™s going on?

sensor:
  #InfluxDB Database Sensor
  - platform: influxdb
    host: 10.10.10.10
    port: 8086
    username: homeassistant
    password: !secret influxdb_password
    scan_interval: 3600
    queries:
      - name: InfluxDB Database Size
        unit_of_measurement: MB
        value_template: "{{ (value | float(0) / 1024 /1024) | round(1) }}"
        group_function: sum
        measurement: '"monitor"."shard"'
        database: _internal
        where: '"database"=''home_assistant'' AND time > now() - 5m'
        field: diskBytes

2023-04-06_14-41-24

1 Like
1 Like

I finally managed to realize this (obviously the need was not that high looking at the amount of time since the OP :smile: ).

Basically when using HA OS there needs to be ANY workaround to make this work reliably. The best one with fewest downsides I could find and tend to recommend is using the community SSH addon as tool/information source. Two implementation options:

There are a few little things to look at (e. g. Unreliable InfluxDB size sensor - #57 by e-raser) but it seems to work quite good. Information flow in generel is:

  1. SSH addon
  2. :arrow_right: file/MQTT broker
  3. :arrow_right::arrow_right: reading from 2 and creating a sensor

A lot of useful information on those two options (and other implementations/possibilities) are in that already recommended topic.