Long Time State Storage on TimescaleDB vs. InfluxDB

Hi.

This is a very brief (but not short :grin: ) post to share my findings and a few subjective opinions when trying to figure out what to use as LTSS solution for my instance. I hope it might also spark some discussion and ideas regarding this topic.

My goal was to have a long term storage of the states for some of the entities, as the default recorder component will just keep e.g. the last 8 days or so (which is clever). First solution that I found was InfluxDB. It made sense in the past when I used SQLite as the main DB, therefore I didn’t want to make it even bigger somehow storing everything. So I used InfluxDB integration, which worked really nicely (I used InfluxDB v2).

However, I migrated to PostgreSQL with TimescaleDB from SQLite (which is really like living in a different world btw - you should likely do it too). And that opened up a possibility of using TimescaleDB for LTSS.

Attempt to migrate the data from InfluxDB 2 to TimescaleDB

I wanted to migrate the data from InfluxDB to TimescaleDB then, but I couldn’t find any good resources and ultimately failed (~decided is not worth it for me). There is this post, but that is for InfluxDB v1 (which is very different) and many people haven’t succeeded too. I will at least write down where I got to save some desperate attempts of others + it actually might work if you put some effort and are after data for only a single entity, for example.

I managed to get to an export of data in CSV via CURL with something like the following:

curl --request POST \
    http://localhost:8086/api/v2/query?org={your-org} \
    --header 'Authorization: Token {your-influx-token-from-ha-config}' \
    --header 'Accept: application/csv' \
    --header 'Content-type: application/vnd.flux' \
    --data 'from(bucket: "ha") |> range(start: -10d) |> toString() |> group(columns: ["_time", "entity_id"], mode: "by") |> drop(columns: ["_start", "_stop", "source", "domain"])' 

but the problem is that once you start increasing the range, it just blocks forever (at least for my instance with a year of data…). Running the command locally via influx query didn’t result in a reasonably parseable output (<10 mins). You could filter for a specific entity by easily adding some filter in the query.

The way the data is stored in the influxdb is that it’s “highly denormalized”. It makes sense and is a good choice for the type of the database, but it’s making the migration quite hard. That’s the reason why I try to groupby ["_time", "entity_id"], because that makes a single logical “write” of a sensor into the DB. For a single state record of type:

time:time,entity_id:string,state:string,attributes:JSON

Influx is going to create as many records as there are key-values in the attributes JSON + one extra for the actual state, so e.g. a “regular” recorder state (which is the same as it would be saved in the LTSS TimescaleDB):

2022-08-21T18:47:36.766409725Z,sensor.dummy,243,{"some":"attribute", "another": "one"}

will in InfluxDB become something (I am simplifying to show the differences):

time,entity_id,field_name,value
2022-08-21T18:47:36.766409725Z,sensor.dummy,some,attribute
2022-08-21T18:47:36.766409725Z,sensor.dummy,another,one
2022-08-21T18:47:36.766409725Z,sensor.dummy,value,243  # this is for actual state

InfluxDB also adds _str suffix to the field name where it needs to map it to string, so you would have to remove that. That’s documented here:

please be aware of the underlying InfluxDB mechanism that converts non-string attributes to strings and adds a _str suffix to the attribute name in this case.

However, InfluxDB will “autocast” numbers to floats (while keeping string states - e.g. “off” - as simple strings), which the LTSS on TimescaleDB will not (it will just have it as strings and you have to cast it; UPDATE: or you can create a computed numeric column, the tradeoff is basically memory vs. CPU + some manual SQL shenaningas)

Grafana with TimescaleDB vs. InfluxDB Dashboards

I was able to swiftly recreate dashboards from InfluxDB in Grafrana (but I had just a few, this might be a blocker for you to rewrite everything). The “clickable” interface from InfluxDB is IMHO slightly nicer and easier to use (I was able to just use it without understanding any of Flux - query lang of InfluxDB 2). However, recreating the dashboards in Grafana (again via official HA Addon) with TimescaleDB was probably similarly easy (even without me knowing much SQL).

For example, a dashboard with two temperature sensors and one ventilation speed one (in %) would look like this in InfluxDB:

which corresponds to (generated) flux script:

from(bucket: "ha")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["friendly_name"] == "Ventilation Flow Speed Percentages" or r["friendly_name"] == "Ventilation Outside Temperature" or r["friendly_name"] == "ViCare Outside Temperature")
  |> filter(fn: (r) => r["_field"] == "value")
  |> filter(fn: (r) => r["_measurement"] == "%" or r["_measurement"] == "°C")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
  |> yield(name: "mean")

Equivalent in Grafana with LTSS on TimescaleDB:
“Clickable interface” (I had to drop into the code though to get everything I wanted):

SQL Code:

SELECT
    $__timeGroupAlias("time",$__interval),
  entity_id AS metric,
  state::float
FROM ltss
WHERE
  entity_id IN (
    'sensor.ventilation_flow_speed_percentages', 
    'sensor.vicare_outside_temperature_2',
    'sensor.ventilation_outside_temperature'
    )
  AND $__timeFilter(time)
  AND state NOT IN ('', 'unavailable')
ORDER BY 1,2

Design

The design of InfluxDB dashboards vs. Grafana is IMHO very similar, I haven’t found any big differences from legibility/user friendliness/niceness:

InfluxDB:

Grafana:

My takeaway

I really think that:

  1. you likely should use PostgreSQL anyway instead of SQLite (unless you have a really simple setup about a handful of components, it will very likely pay off wrt. responsiveness)
  2. you should use that PostgreSQL+TimescaleDB addon
  3. you can then easily just start using LTSS on TimescaleDB addon
  4. Grafana works and integrates nicely if you need charts, and both solutions are fairly user-friendly wrt. chart creation
  5. you, therefore, likely don’t need InfluxDB, i.e. I don’t think it’s worth it

Backups

Once you jump on the PostgreSQL, you might also want to do regular backups (which are much easier with SQLite). I am going to share a simple quick-and-dirty cron job-like solution if you are interested, but as you can imagine, this vary a lot and you might want to invest into something more robust:

#!/bin/bash
set -e

BACKUP_DIR=/root/db-backup # on remote
ADDRESS=root@ssh_ha  # this handle `ssh_ha` is setup in .ssh/config
BACKUP_NAME=$(date -Iseconds).dump
BACKUP_FILE=$BACKUP_DIR/$BACKUP_NAME

echo "STARTING" && \
	ssh $ADDRESS mkdir -p $BACKUP_DIR && \
        ssh $ADDRESS "apk update && apk add postgresql && PGPASSWORD={{password for the postgres user, likely `homeassistant`}} pg_dumpall --data-only --disable-triggers -h {{your-timescaledb-hostname}} -U postgres --file $BACKUP_FILE" && \
	scp $ADDRESS:$BACKUP_FILE . && \
	ssh $ADDRESS rm -f $BACKUP_FILE && \
	echo "BACKUP DONE"

# notify HA if something failed or just log success
[ $? -eq 0 ] && WEBHOOK_ID="backing-up-confirmations-success" || WEBHOOK_ID="backing-up-confirmations-failed"
curl -X POST -H "Content-Type: application/json" -d '{ "backup_name": "'$BACKUP_NAME'" }' http://192.168.0.202:8123/api/webhook/$WEBHOOK_ID

with the webhook automation:

alias: Backing up confirmations
description: ""
trigger:
  - platform: webhook
    webhook_id: backing-up-confirmations-success
    id: success
  - platform: webhook
    webhook_id: backing-up-confirmations-failed
    id: failure
condition: []
action:
  - choose:
      - conditions:
          - condition: trigger
            id: success
        sequence:
          - service: logbook.log
            data:
              name: DB Backups
              message: Backup {{ trigger.json.backup_name }} finished successfully
      - conditions:
          - condition: trigger
            id: failure
        sequence:
          - service: persistent_notification.create
            data:
              message: Backup {{ trigger.json.backup_name }} failed
    default:
      - service: persistent_notification.create
        data:
          message: Something with the backups went weirdly
mode: single

and crontab -e (this runs a backup every 5 days at 4am):

0 4 */5 * * cd /home/homer/db-ha-backup && bash ssh-backup.sh

This assumes that you use SSH addon so you can ssh into the HA instance (works on hassio). Obviously, very specific to my settings, but you might find it useful.

4 Likes

Thanks for very detailed post on TimescaleDB vs. InfluxDB as a long-term storage for Home Assistant!

It would be great if you could look at the long-term metrics storage I work on - VictoriaMetrics. It supports data push in InfluxDB line protocol - see these docs. It also supports Prometheus-compatible querying API, so it can be queried with PromQL via Prometheus datasource in Grafana. The PromQL is easier to use than Flux or InfluxQL when building typical queries over time series data - see this article.

And the main VictoriaMetrics feature for Home Assistant - it requiress less RAM, CPU, disk IO and disk space than InfluxDB and TimescaleDB.

1 Like

Hey. Thanks for the tip, but it’s unlikely I would get to it any time soon. Also, I am using TimescaleDB anyway, so it won’t be less memory.

Hi @kotrfa!

As the Author of the TimescaleDB addon, I want to say thank you for this very nice post, and welcome to the TimescaleDB ecosystem!

You mention the data only being a string, and you had to cast each time in every query.
To boost performance, I have laid out a solution to this in the opening post on github:

Pinned: Additional Documentation for Timescaledb Integration · Issue #1 · Expaso/hassos-addons (github.com)

Please let me know if this helps you!

Thanks for great software and a reply! I added the mention in the post above.