Inspired by this thread I decided to do a writeup of my influxDB v2 settings and tasks.
The InfluxDB is running on an external vps to preserve the local sdcard. It is run as a docker container with docker-compose, published with a traefik reverseproxy and it uses a Let’s Encrypt certificate automatically requested by treafik.
Data gathering
I am not sure which data I want to retain in longterm-storage, so I decided to store all data, except for the metrics I am sure I don’t need or that make little sense too keep, into a bucket called homeassistant
.
This is the resulting influxdb configuration in ha’s configuration.yaml
:
influxdb:
api_version: 2
host: influx.example.com
port: 443
organization: 12345abcde # hexadecimal org-id
bucket: homeassistant
token: !secret influxtoken
measurement_attr: unit_of_measurement
default_measurement: units
tags:
source: HA
tags_attributes:
- friendly_name
exclude:
domains:
- persistent_notification
- automation
- device_tracker
- group
- scene
- schedy_room
- script
- update
- alert
- camera
- remote
- This configuration will store all data to measurements (influx-term for something like a table) named after the corresponding unit (°C, kWh, V, A, etc.)
- When the unit is not specified, the unit will be “units”.
- The entity’s attributes will be put into influxdb as well. I am not really interested in those data because it is mostly not relevant and the types are not well defined. As it seems there is no proper include/exclude mechanism for attributes so we will discard them later.
- The bucket
homeassistant
and the api token need to be created before activating the configuration. This task is quite self-explanatory with the influxdb ui.
Downsampling data
So far, all data are stored into a single bucket. InfluxDB’s concept for downsampling data is to run regular tasks that copy aggregated data into additional buckets.
For now, this is my retention plan:
- Keep the detailed data in bucket
homeassistant
for 30 days - Drop the attributes when copying into the aggregation buckets. This means we will only care for the entity’s
value
field. - Keep min, max and mean values for the aggregation periods
- Keep 5 minute aggregations (I did not decide on the retention period yet)
- Keep 15 minute aggregations (I did not decide on the retention period yet)
- Keep 60 minute aggregations (I did not decide on the retention period yet)
Next step will be to create the following buckets via the influxdb ui:
homeassistant_5m
homeassistant_15m
homeassistant_1h
Then the tasks to copy data into those buckets can be created with the ui.
This is the task to fill the 1h bucket. For the other ones, the time and the target-bucketnames need to be adjusted.
option task = {name: "homeassistant_1h", every: 1h}
data =
from(bucket: "homeassistant")
|> range(start: -task.every)
|> filter(fn: (r) => r["_field"] == "value")
data
|> aggregateWindow(every: 1h, fn: mean, createEmpty: false)
|> set(key: "agg_type", value: "mean")
|> to(bucket: "homeassistant_1h", org: "my_Org")
data
|> aggregateWindow(every: 1h, fn: max, createEmpty: false)
|> set(key: "agg_type", value: "max")
|> to(bucket: "homeassistant_1h", org: "my_Org")
data
|> aggregateWindow(every: 1h, fn: min, createEmpty: false)
|> set(key: "agg_type", value: "min")
|> to(bucket: "homeassistant_1h", org: "my_Org")
Downsampling older data
I added the downsampling tasks a few months after collecting data into the homeassistant bucket. So there is data to be aggregated outside of the tasks before to retention of the source-bucket can be set.
This can be achieved with the following queries executed in the data explorer.
Here are the queries for the 1h bucket:
from(bucket: "homeassistant")
|> range(start: 2022-06-03T00:00:00Z, stop: 2022-08-27T11:50:00Z)
|> filter(fn: (r) => r["_field"] == "value")
|> aggregateWindow(every: 1h, fn: max, createEmpty: false)
|> set(key: "agg_type", value: "max")
|> to(bucket: "homeassistant_1h", org: "my_Org", tagColumns: ["agg_type", "domain", "entity_id", "friendly_name", "source"])
from(bucket: "homeassistant")
|> range(start: 2022-06-03T00:00:00Z, stop: 2022-08-27T11:50:00Z)
|> filter(fn: (r) => r["_field"] == "value")
|> aggregateWindow(every: 1h, fn: min, createEmpty: false)
|> set(key: "agg_type", value: "min")
|> to(bucket: "homeassistant_1h", org: "my_Org", tagColumns: ["agg_type", "domain", "entity_id", "friendly_name", "source"])
from(bucket: "homeassistant")
|> range(start: 2022-06-03T00:00:00Z, stop: 2022-08-27T11:50:00Z)
|> filter(fn: (r) => r["_field"] == "value")
|> aggregateWindow(every: 1h, fn: mean, createEmpty: false)
|> set(key: "agg_type", value: "mean")
|> to(bucket: "homeassistant_1h", org: "my_Org", tagColumns: ["agg_type", "domain", "entity_id", "friendly_name", "source"])
Edits
- Spelling and grammar
- Use aggregateWindow instead of simple min(), max(), mean() functions in tasks. Before, mean() wasn’t working correctly.