Custom component: Long Time State Storage (LTSS) utilizing TimescaleDB

tldp · November 26, 2023, 7:25am

TLDR: you need to install gcompat

Hi! Finally sunk some hours into it now myself. I think I found a working solution!
After trying some minimal approaches & your full Dockerfile I created a MWE that should be applicable to your Dockerfile as well.

The builds took ~1.5h each, so I didn’t want to spend more time on that, especially since you are much more proficient in your repo than I am

The sigkill is triggered due to the MUSL vs GLIBC issue, as you already suspected.
So I did a little research on that topic and found a few resources:
Running glibc programs on alpine
gcompat
Alpine glibc package

After that I tried to get your Dockerfile to run w/o the toolkit build - it did run but I had to update a few versions and remove a few version pins to get it to work. All in all the build process took too long, so I switched to the MWE approach for the toolkit build.

Here is the final Dockerfile which builds timescaledb + the toolkit w/o any problems on alpine (but in 5915 seconds, so quite a while!):

FROM ghcr.io/hassio-addons/base/aarch64:14.3.2

# Install Postgresql & build dependencies
RUN apk update \
    && apk add --no-cache cargo clang cmake curl gcc gcompat make musl-dev git openssl-dev pkgconfig postgresql postgresql-dev rust rustfmt

# Build timescaledb
RUN git clone https://github.com/timescale/timescaledb \
    && cd timescaledb \
    && ./bootstrap \
    && cd build \
    && make install

# Set up Postgresql extension build environment
RUN cargo install --version '=0.10.2' --force cargo-pgrx \
    && cargo pgrx init --pg15 pg_config

# Build toolkit
RUN git clone https://github.com/timescale/timescaledb-toolkit \
    && cd timescaledb-toolkit/extension \
    && cargo pgrx install --release \
    && cargo run --manifest-path ../tools/post-install/Cargo.toml -- pg_config

# Prepare folder for socket
RUN mkdir /run/postgresql
RUN chown postgres:postgres /run/postgresql/

# Initialise DB
USER postgres
RUN initdb -D /var/lib/postgresql/data

## Poor man's test with a nasty sleep :)
RUN (postgres -D /var/lib/postgresql/data &) \
    && sleep 2 \
    && echo "CREATE DATABASE testdb;" | psql \
    && ON_ERROR_STOP=on psql testdb -c "CREATE EXTENSION timescaledb_toolkit; CREATE TABLE test(ts timestamp, value float); SELECT time_weight('Linear', ts, value) FROM test;"

ENTRYPOINT postgres -D /var/lib/postgresql/data

TLDR: you need to install gcompat

Expaso · November 26, 2023, 11:45am

This is great! Thank yo so much!

I went down the gcompat route once while I was borrowing binaries from the official docker image, but it never occured to me I could also need this to fix the SIGKILL-issue during the build!

So with that out of the way, the next thing I guess should be to split the build-times over different pipelines. I was almost thinking about creating an alpine package myself with timescaledb-toolkit in a separete repo+pipeline, and onlu pull in the package during the build of the add-on.

What do you think?

tldp · November 27, 2023, 6:12pm

The build times were so high because Docker was not caching any steps on failed builds for me so I had to rebuild everything when any step failed.
I don’t know whether this is by design or not; just found an issue for buildx: Export build cache even if build fails · Issue #1141 · docker/buildx · GitHub

I mitigated this by splitting up the steps in multiple Dockerfiles.
For non-failing pipelines I guess you could still stick to one single Dockerfile if you arrange it efficiently (order steps by least changes) and if pipeline caches are properly set up.

Build times should then be quite low as well.

I guess you can’t parallelize much here since things depend on each other as far as I understood.

remmob · December 17, 2023, 10:18pm

I understand it is possible to use the TimescaleDB integration with the recorder and ltss integration. (Custom component: Long Time State Storage (LTSS) utilizing TimescaleDB - #33 by freol)
What about the retention settings for the recorder, is that gonna affect the data stored in the ltss table?

Expaso · December 30, 2023, 8:07pm

No it’s not. The recorder and the LTSS addon store the same data, and have separate settings. Difference is, the LTSS stores it for long term, and the standard recorder stores it for at most a few days.

cfi · January 1, 2024, 3:21pm

I am using ltss with timescaledb and it is working fine.

Via appdaemon I have created some new sensor entities. The names of these entities are part of the entity_globs setting of ltss.

But this new entities doesn’t appear in ltss database.
What could be the reason?

my guess it is related to namespace.

This is the configuration:

ltss:
db_url: !secret db_url_ha_long_term
exclude:
entity_globs: “"
include:
domains:
#- sensor
#- binary_sensor
#- sun
entity_globs:
- "sensor.e3dc”
- “sensor.openweathermap*”

This are my sensors, created by appdaemon

cfi · January 7, 2024, 11:33am

Creating sensor entities seems to be not the correct approach. Instead I use mqtt and in Home Assistant configure an mqtt sensor. This is working fine.

sunshine-nick · February 5, 2024, 10:38am

What a great addon !!
I have installed HA addons for progress and timescaleDB and LTSS and Grafana instead of using sperate servers for influxDB and Grafana, I am very happy to say goodbye to Influx and enthusiatic about the ability to store LT data in an SQL environment which seems to be something that time series DBs are returning to, or embracing. But I have 2 questions for which i cant find the answers .
Question 1) Postgress is now running as a docker based addon how do I get external accces to it. External access to long term data is very important for future analytics and the movement of data into other AI pipelines. These are seperate concerns that need to be addressed outside of HA and addons. I want to run other SQL applications outside of the HASS containers on a desktop machine (on the same network) and attach to postgres and access my LTSS data but I cant find how to expose the postgres port and the correct url to use.
Question 2) I want to use the configuraton options to include and exclude entities from being sent to Timescale LTSS table. BUT I cannot find examples how how to construct the filters to allow me to filter the entity names other than to use a ‘*’ wild card. So what filter filter options do I have to do this. Can I use NINJA2 templates for this. If so an example of a more complex filter matching part of an entity name would be nice to see. I want to rename all my entities in a prefixed and postfixed way that would allow me to group and manage my entities more preciseley by function and control what is sent to LTSS. I want to know that they can be filtered by LTSS before I do this.

sayanova · March 4, 2024, 1:29am

Hi, with TimescaleDB and LTSS my Database grow huge in linear scale. I now have 10GB. I have 2000~ entities total and can handle 20GB or more DB size, but worring how much it will grow. but Can anyone help me with my config? Is anything wrong? Is including all sensor and person value bad practice?

recorder:
  db_url: !secret recorder_postgres_homeassistant_url
  auto_purge: true
  auto_repack: true
  commit_interval: 30
  exclude:
    entities:
      - sensor.freshrss_feed
      - sensor.unifi_status_alerts
      - sensor.products_shopping_list_with_grocy
      - sensor.grocy_chores
ltss:
  db_url: !secret recorder_postgres_postgres_url
  chunk_time_interval: 2592000000000
  include:
    domains:
      - sensor
    entities:
      - person.REDACTED
      - person.REDACTED

Expaso · March 19, 2024, 9:15am

Just use compression and/or downsampling:

Timescale Documentation | Compression

Timescale Documentation | Continuous aggregates

Compression is a fairly quick win where no changes to existing are needed.
Downsampling is making the data less detailed, but just aggregated.

Specific for LTSS, also read this: Pinned: Additional Documentation for Timescaledb Integration · Issue #1 · Expaso/hassos-addons (github.com)

Expaso · March 19, 2024, 9:20am

sunshine-nick:

Question 1) Postgress is now running as a docker based addon how do I get external accces to it. External access to long term data is very important for future analytics and the movement of data into other AI pipelines. These are seperate concerns that need to be addressed outside of HA and addons. I want to run other SQL applications outside of the HASS containers on a desktop machine (on the same network) and attach to postgres and access my LTSS data but I cant find how to expose the postgres port and the correct url to use.
Question 2) I want to use the configuraton options to include and exclude entities from being sent to Timescale LTSS table. BUT I cannot find examples how how to construct the filters to allow me to filter the entity names other than to use a ‘*’ wild card. So what filter filter options do I have to do this. Can I use NINJA2 templates for this. If so an example of a more complex filter matching part of an entity name would be nice to see. I want to rename all my entities in a prefixed and postfixed way that would allow me to group and manage my entities more preciseley by function and control what is sent to LTSS. I want to know that they can be filtered by LTSS before I do this.

You can open-up the HA container addon to the outside world on any port you like, in the config of the addon in HA.
But… I also had another guy who wanted to run the container addon without Home Assistant. I am currently making that a bit easier, so you could use exactly the same docker image as a home assistant addon, and/or a postgresql/postgis/timescale/timescaledbtools image for running outside of Home Assistant (like on kubernetes or just docker).
Filtering of entities works the same way as for the recorder component ( Recorder - Home Assistant (home-assistant.io)) , or you could use a pgagent job to remove the data you don’t want once every x time.

pyrodex · April 3, 2024, 4:19pm

I have a fundamental question about this add-on and the need in a modern world.

I just recently moved to PG16 with TimeScaleDB and wanted to know what is the value of LTSS with TimeScaleDB as the backend? Does TSDB not do a good enough job to make LTSS not as needed in this day and age?

I am not trying to be a jerk but just trying to determine what is the value of this if the backend is already TSDB via the recorder.

Thanks!

agittins · April 20, 2024, 6:26am

I think this is more a question of how HA is designed, rather than this component. Since HA needs to support sqlite db’s on sd cards in raspberry pi’s etc, it takes a design approach that works well for that, and we expand from there. So yes, TSDB could do it all quite effectively, but since using TSDB is an alternative backend, it has to fit the existing design requirements, hence we have the separate recorder and ltss datastores.

On my system I point both recorder and ltss to the same TSDB database, works great. Having the recorder and ltss in separate tables does mean that bad query designs in HA for the recorder don’t bog down the UI, which is a plus (it’s not always easy to keep query optimisation high when supporting multiple db backends).