Custom component: Long Time State Storage (LTSS) utilizing TimescaleDB

freol · December 14, 2019, 1:28pm

Long time state storage (LTSS) custom component for Home Assistant

NOTE: Starting 2020-09-13 attributes are stored with type JSONB instead of as a plain string, in addition a GIN index is created on this column by default. At first startup after updating of LTSS, migration of your DB happens automatically. Note that this can take a couple of minutes and HASS will not finish starting (i.e. frontend will not be available) until migration is done.

WARNING: I take no responsibility for any data loss that may happen as a result of this. Please make sure to backup your data before upgrading!

Enabling simple long time state storage (LTSS) for your sensor states. Requires a PostgreSQL instance with the following extensions:

TimescaleDB
PostGIS

This component is not to be considered as a replacement to the recorder component in Home Assistant but rather as an alternative to the InfluxDB component for more space-efficient long time storage of specific sensor states.

Nice to know:

Fully SQL compatible -> works with the SQL sensor in Home Assistant
Compatible with Grafana for visualization of time series:
- https://blog.timescale.com/blog/grafana-time-series-exploration-visualization-postgresql-8c7baa9c3bfe/
- https://grafana.com/docs/grafana/latest/features/datasources/postgres/

Installation

Precondition

Make sure that you PostgreSQL instance is up and running and that you have created a database, DB_NAME . I recommend the following docker image: https://docs.timescale.com/v1.0/getting-started/installation/docker/installation-docker#postgis-docker to get started quickly and easy.

Manual installation:

Put the ltss folder from custom_components folder in this repo to a folder named custom_components in your HA config folder

Automatic installation:

Just install ltss as an integration via HACS

configuration.yaml

Add a section to your HA configuration.yaml:

  ltss:
      db_url: postgresql://USER:PASSWORD@HOST_ADRESS/DB_NAME
      include:
          domains:
          - sensor
          entities:
          - person.john_doe

NOTE : During the initial startup of the component, the extensions will be created on the specified database. This requires superuser priviligies on the PostgreSQL instance. Once the extensions are created, a user without superuser rights can be used! Ref: Can I use Timescale DB as an alternative to Influx DB in Homeassistant for Grafana?

Configuration

ltss
(map)(Required) 
Enables the recorder integration. Only allowed once.

    db_url
    (string)(Required)
    The URL that points to your database.

    exclude
    (map)(Optional)
    Configure which integrations should be excluded from recordings.

        domains
        (list)(Optional)
        The list of domains to be excluded from recordings.

        entities
        (list)(Optional)
        The list of entity ids to be excluded from recordings.

    include
    (map)(Optional)
    Configure which integrations should be included in recordings. If set, all other entities will not be recorded.

        domains
        (list)(Optional)
        The list of domains to be included in the recordings.

        entities
        (list)(Optional)
        The list of entity ids to be included in the recordings.

Details

The states are stored in a single hypertable with the following layout:

Column name:	id	time	entity_id	state	attributes	location
Type:	bigint	timestamp with timezone	string	string	JSONB	POINT(4326)
Primary key:	x	x
Index:	x	x	x	x	x

Chunk size of the hypertable is set to 1 month.

The location column is only populated for those states where latitude and longitude is part of the state attributes.

Credits

Big thanks to the authors of the recorder component for Home Assistant for a great starting point code-wise!

CM000n · December 16, 2019, 11:47am

Thank you very much @freol.
Will definetly try it, when I’m back from Holiday

robmarkcole · December 27, 2019, 4:00pm

Hi @freol
great work on this. Question about Grafana integration, for plotting does it matter that the state is stored as a string rather than a float? Also for attributes I would have thought a JSON datatype would be more suitable? I’ve been playing aroung with processing the db data here.
Cheers

CM000n · December 27, 2019, 4:13pm

Hi @robmarkcole, you can CAST the String as INT or DECIMAL or other datatypes in the Grafana SQL Query.
So far it works perfectly here for me.

robmarkcole · December 27, 2019, 4:26pm

How about if you want to use any of the timescale specific aggregation functions?

freol · December 27, 2019, 4:27pm

Hi,

As @CM000n says, it is possible to cast to other types directly in the query

select state::float from ltss;

In the same manner it is possible to cast the attributes to json:

select attributes::json from ltss;

robmarkcole · December 27, 2019, 4:34pm

Hi @freol
what I would like to understand is if queries like this can be run?

SELECT time_bucket('5 minutes', time) AS five_min, avg(cpu)
FROM metrics
GROUP BY five_min
ORDER BY five_min DESC LIMIT 10;

I think it would be as simple a change as avg(cpu::float)
thanks

freol · December 27, 2019, 4:38pm

Yes, correct, in this example, just replace cpu by cpu::float

CM000n · December 27, 2019, 4:55pm

BTW: Another nice side effect for me is that I can now use the Timescale Docker container for both, the HA recorder and the long time state storage (Each with its own database of course).
This saves me the separate Postgress and Influxdb containers and I only have to maintain one container now.

Tjeerd · December 31, 2019, 2:17pm

Would it be possible to take the data that is being logged and back feed it into a sql sensor in order to graph it with the mini graph card for example?

select 		time_bucket('1 day', time)::date as ts,
    		round( cast( (cast(last(state, time) as float) - cast(first(state, time) as float)) as numeric), 4) as kwh
from 		ltss
where 		entity_id = 'sensor.main_totalenergy'
and 		time::date >= cast(date_trunc('month', current_date) as date)
group by 	ts;

This returns date’s and the daily consumption and would love to be able load it back into a graph.

freol · January 1, 2020, 3:45pm

@Tjeerd , this question is not really directly related to the LTSS component but here is my take on it:

The SQL sensor is 100% compatible with LTSS but is implemented as a “typical” HA sensor integration (i.e expects a single value on each update and then stores it with the regular recorder instance etc. Not really suitable for this kind of task where you want to show a history of another entity (regardless of db)
The mini-graph lovelace card depends on the Home Assistant API. More specifically the history sub-api which is integrated with the regular history/recorder component of HA only. Hence, no luck here unless HA makes the history component a bit more modular (i.e. accepts other storage solutions and not only the regular recorder component).
My preferred choice is to use grafana, which makes it really easy to graph things like your example but is a little more complicated to integrate with HA frontend, a nice guide can be found here (just ignore all the talk about influxdb and replace those step with LTSS instead.

Hope this gives some ideas about your possibilities!

robmarkcole · January 5, 2020, 7:47am

@Tjeerd another option is to write a script that is run daily and generates a custom output/plot. See this thread. You might also want to checkout the SMA filter.
Cheers

Tjeerd · January 5, 2020, 9:15am

Thanks for the advice @freol and @robmarkcole !

I went a different route for now. Although Grafana is very flexible, it’s a bit of a pain to get some main line stats back into HA. I’ve setup a SQL Sensor which logs the min / max for the current day. No history, but as it runs it will build it. The value of the sensor is going back into the recorder component and I can use in the mini graph card for now

carver · January 20, 2020, 9:46pm

@CM000n @freol

I’d really appreciate your advice on the following:

I’ve been trying to figure out how to do that as well ( forum link ), so how to best set that up?
(I’m new to both HA and Docker)

Specifically, I’m getting caught up as to where it would be best to either mount a Docker volume or where to bind the Postgres container to:

sudo docker pull timescale/timescaledb-postgis

sudo docker run \
–name chooseContainerName -e \
POSTGRES_PASSWORD=postgresRootUsersPassword \
-v /your/data/dir:/var/lib/postgresql/data \
-p 5432:5432 \
-d timescale/timescaledb-postgis

So for the choice of /your/data/dir :
e.g. would it be better to:
1.) bind it to the same dir as Home Assistant
(in my case generic linux install of Hassio):
-v /usr/share/hassio/homeassistant:/var/lib/postgresql/data
OR
2.) use default Docker volume location:
-v chooseVolumeName:/var/lib/postgresql/data
OR
3.) somewhere else
?

I would be glad to receive any help on this…
Thanks!

CM000n · January 21, 2020, 5:36am

I’m using docker compose and storing the timescale data on my btrfs raid.
But I think you can store them wherever you want and have enough space

  timescale:
    container_name: timescale
    restart: unless-stopped
    image: timescale/timescaledb-postgis:latest
    volumes:
      - /srv/dev-disk-by-id-ata-ST32000542AS_5XW2HFF9/data/docker_data/timescale:/var/lib/postgresql/data
      - /etc/localtime:/etc/localtime:ro
    environment:
      - POSTGRES_USER=myuser
      - POSTGRES_PASSWORD=mypassword
    ports:
      - "5442:5432"

caear1987 · January 22, 2020, 2:24pm

Is possible to use this for store states for e.g. 1 year? Will it work without any problem on raspberry pi 3?

CM000n · January 22, 2020, 9:00pm

Even if it is basically feasible, it is probably not a wise decision to run a relational database to back up long term data on a Raspberry Pi.
You might encounter performance issues.
Here is a very interesting blog article about the Postgress Performance on a Rasberry Pi 3: https://blog.rustprooflabs.com/2019/04/postgresql-pgbench-raspberry-pi

It is also questionable whether the space on a Raspberyy Pi is sufficient for this.
Just to give you a rough idea: I currently have about 350 entities and sensors in Home Assistant being tracked. My LTSS Postgres database has been running since the beginning of the year and already occupies about 400MB:

caear1987 · January 22, 2020, 10:12pm

Thank you for answer. So i need to find different solution for that what i want - store at least min, max, average for each hour for last 3 months, and then for data older than 3 months values for each day for each sensor recorded by recorder. Any idea?

CM000n · January 23, 2020, 2:15pm

Mhmm, you could perhaps write an SQL script that will take the values to the required level of aggregation and purge the values that are no longer needed.
You could run this automatically, for example with the help of CRON or pg_cron: https://www.citusdata.com/blog/2016/09/09/pgcron-run-periodic-jobs-in-postgres/.

If you want to use another database, in MySQL there is also an event scheduler: https://dev.mysql.com/doc/refman/5.7/en/event-scheduler.html

This way you could keep your database as small as possible.
Just a quick idea…

freol · January 23, 2020, 6:25pm

@CM000n @caear1987

A suggestion is to have a look in the docs for timescaledb, specifically these should be of interest:
https://docs.timescale.com/latest/using-timescaledb/continuous-aggregates
https://docs.timescale.com/latest/using-timescaledb/data-retention

Note: I havent tried these features myself so cant really say much else.