Data model and code to add useful metadata to InfluxDB as tags

Idea and my background

I find the idea fascinating to store sensor data long-term and then be able to draw conclusions from that data. Since 2020 I use InfluxDB OSS v1 together with the InfluxDB integration. Before, I used volkszaehler.org and FHEM. Over the last weeks, I redesigned my data model. Basically I want to store metadata so that data is as useful as possible in the future for example to analyze sensor drift, sensor aging, citizen science, smart home forensics and random questions like how did indoor/outdoor temperature compare over the last 5 years.

The data model description and use cases

The InfluxDB integration currently does not add metadata that Home Assistant has about a entity or device to InfluxDB. I consider the following non-standard (by the integration) valuable:

  • unique_id of the entity_id: Use case: If a sensor is replaced, entity_id might stay the same but unique_id should be different. This allows to analyze differences caused by a different sensor.
  • entity_category: Use case: Allow to ignore or delete diagnostics data.
  • state_class: Use case: Only show me measurements and no total amounts.
  • device_id: Use case: Correlate sensor data with the device that measured it. Show all data from one device.
  • floor_id: Use case: Show all sensor data from one floor. When moving houses, this ID should probably be made a different one.
  • area_id: Use case: Show all sensor data from one area.
  • environment_id, Environment: Use case: Be able to take a sensor out of service to replace batteries, for example. Be able to do an experiment that does not reflect normal background readings. Hint: If you care about this, I found it really helpful if the device includes an accelerometer and exposes a movement counter like RuuviTag do. In case you or someone else forgets about keeping the environment up-to-date, the data still indices that something has changed. I got the idea to model the environment as label from a recent https://hasspodcast.io/ episode. The environment can also be used to automatically create dashboards of Prod devices for example. For InfluxDB, the environment is also helpful when adding new devices to Home Assistant as it allows you to fine tune which entities of the new device should be enabled and which entity attributes should be written to InfluxDB before setting the device to Prod.
  • subarea_id: Think of this as a description of the location within the area. For example: Office desk or fridge. Does not need to include height, so no “above fridge” or “inside fridge” is needed because the height and entity id should already make that clear. Use case: Show sensors which are in proximity. Know that changed readings are related to the sensor having been moved to another position.
  • height of the entity above the flooring. For example, if a device (or entity) stands directly on the flooring, the height is “0.00m”. Only one unit should be used to make it sortable. If you go with meter, do not use units like centimeters. Use case: Gases/air measurements are different depending on the height due to earths gravity. For example 1 °C higher temperature on the ceiling compared to on the flooring of the same position is not uncommon.

How does the data model look in InfluxDB

> show tag keys from sensor
name: sensor
tagKey
------
area_id
device_class
device_id
domain
entity_category
entity_id
environment_id
floor_id
height
state_class
subarea_id
unique_id
unit_of_measurement

> show field keys from sensor
name: sensor
fieldKey            fieldType
--------            ---------
state               string
value               float

Grafana dashboard

How to set this up

Now the difficult part, the InfluxDB integration YAML currently does not support related metadata to be added, nevertheless, here is my YAML to get you started:

precision: 'ms'

# I have patched
# homeassistant/components/influxdb/__init__.py
# to support this. Keep the number of measurements small by only creating one
# measurement per domain. `domain__device_class` would be another option. The default of `unit_of_measurement` is bad I think because it creates measurements like "µg/m³" and because a measurement called "ppm" does not say much.
measurement_attr: 'domain'
default_measurement: 'default'

exclude:
  domains:
    - 'person'
    - 'persistent_notification'
    - 'schedule'
    - 'todo' # Only emits numeric data. Not useful.
    - 'automation' # I don’t think storing automation runs long term is useful.
  entity_globs:
    # As of Home Assistant 2024.11, sun entities are of domain sun.
    # TODO: Docs are wrong:
    # https://www.home-assistant.io/integrations/influxdb/#full-configuration-for-2xx-installations
    - 'sensor.sun_*'

# I decided to include all because the storage size in InfluxDB is minimal
# so it is not worth me including things and then forgetting something.
# include: ...

# tags:
#   tsdb_schema_version: "0.9.0"
# tsdb_schema_version as tag is not really needed. Downside is that it
# creates new series whenever this gets bumped. I can have the best of both
# worlds by just documenting schema changes here.
#
# InfluxDB schema versions [[[
#
# * 0.9.0
# * 0.9.1 activated: 2022-12-23T23:17:41Z
# * 1.0.0 activated: 2025-02-xx
#
# ]]]

tags_attributes:
  - device_class
  - state_class
  - unit_of_measurement
  - device_file

  # Added via source code patch:
  # See homeassistant/components/influxdb/__init__.py

ignore_attributes:
  - attribution
  # - country_code # Source?
  - friendly_name
  - icon
  - latitude
  - longitude
  - radius
  - options
  - repositories

  # Should this be included?
  - last_reset

  # https://developers.home-assistant.io/docs/core/entity/sensor/
  - native_unit_of_measurement
  - native_value
  - options
  - suggested_display_precision
  - suggested_unit_of_measurement

But most of the interesting metadata currently requires source code patching of Home Assistant: Data model and code to add useful metadata to InfluxDB as tags by ypid · Pull Request #1 · ypid/home-assistant-core · GitHub

As for how to install this, it depends but the basic idea for now is to copy __init__.py to homeassistant/components/influxdb/ of your Home Assistant. This will get overwritten on each upgrade. I don’t want to post more details. Either you can figure it out on your own or you should probably wait until this has been integrated into a Home Assistant release.

Please don’t ask me when I can submit this as PR. I expect this will require some work and I have other private projects that are more urgent. If you want it earlier, feel free to take my work and rework it into pull requests.

I decided to model environment, subarea and height as labels which you will need to assign to devices. Here is a screenshot of some of my labels to get you an idea:

The name, or more specific the ID derived from the name of the label on create, needs to match what you see in the screenshot.

How a PR against Home Assistant could look like

The metadata environment, height and subarea could be implemented as Python eval() of the config in the event_to_json function? This would be the most flexible. I think config options might not cut it? I know eval() can be evil. Something like this:

# https://www.home-assistant.io/integrations/influxdb YAML config
python_eval: |-
  labels = get_labels_for_entity(entity, device)
  env_labels = list(labels.intersection(["prod", "test"]))
  if len(env_labels) > 0:
      if len(env_labels) == 1:
          json[INFLUX_CONF_TAGS]["environment_id"] = env_labels[0]
      else:
          _LOGGER.warning(
              f"{state.entity_id} has multiple env labels: {sorted(env_labels)}"
          )
          json[INFLUX_CONF_TAGS]["environment_id"] = "test"
  else:
      json[INFLUX_CONF_TAGS]["environment_id"] = "test"

All metadata except environment, height and subarea could be made a config option.

FAQ

  • Why implement environment, subarea and height as Home Assistant labels? Mainly because it is easier to attach/detach those “dimensions” to devices in the Home Assistant UI and let Home Assistant manage uniqueness of the IDs, allows adding descriptions to subareas. The alternative would be to add those dimensions to the entity ID but changing entity IDs is not as smooth as using labels. Also, subarea and height are probably too uncommon to be added to Home Assistant like areas and floors are. Maybe a custom component, but what would that really improve over the current use of labels?

You might also be interested in

If you have read this far, you might also be interested in:

3 Likes

Thanks for sharing this, it was an interesting read and i really enjoy your way of assessing the data at hand🤗

Bests

1 Like