Custom component: Long Time State Storage (LTSS) utilizing TimescaleDB

CM000n · March 5, 2021, 5:07pm

So why don’t you just exclude these data points with a corresponding rule in the WHERE condition?

poerle · March 5, 2021, 6:44pm

Can you explain how to do that?

CM000n · March 5, 2021, 7:10pm

Sure
If we take the last query as an example just modify it like this:

SELECT
  $__timeGroupAlias("time",$__interval,previous)
  , 'YourTemperatureSensor'
  , MAX(state::FLOAT) AS value
FROM ltss
WHERE
  $__timeFilter("time") 
  AND entity_id = 'sensor.thgn122_123_thgn132_thgr122_228_238_268_58_01_temperature'
  AND state NOT IN ('', 'unavailable')
GROUP BY 1,2
ORDER BY 1,2

You can find an overview for common pg sql functions here: https://www.postgresql.org/docs/12/functions.html

poerle · March 5, 2021, 10:48pm

Thank you!!

Motik68 · March 8, 2021, 9:38pm

Hello,

I installed the “TimescaleDB (PostgreSql + Extensions)” add-on from HACS and LTSS. The PostgreSQL database works fine by itself, but when I try to restart HASSOS with LTSS enabled I get the following error:

Logger: custom_components.ltss
Source: custom_components/ltss/__init__.py:167
Integration: ltss (documentation)
First occurred: 22:03:04 (10 occurrences)
Last logged: 22:03:52

Error during connection setup: (psycopg2.errors.UndefinedFile) could not load library "/usr/lib/postgresql/postgis-3.so": Error relocating /usr/lib/postgresql/postgis-3.so: __time64: symbol not found [SQL: CREATE EXTENSION IF NOT EXISTS postgis CASCADE] (Background on this error at: http://sqlalche.me/e/13/e3q8) (retrying in 3 seconds)

I didn’t find any reference to that exact error on the net. Does anybody know what I could have done wrong? I already uninstalled and reinstalled both PostgreSQL and LTSS to no avail…

Many thanks in advance!

CM000n · March 9, 2021, 7:18am

It seems that your Postgres Installation is missing the postgis Extension.
Either install it manually or use the docker image with preinstalled postgis Extension

Motik68 · March 9, 2021, 9:10am

Thanks, I checked the configuration of the TimescaleDB add-on, which should include postgis as well, and I get the same error about postgis-3.so missing when trying to manually install the extension.

I will go ask my question in the TimescaleDB add-on thread.

freol · March 10, 2021, 12:41pm

@bingo5

FYI, a configuration option for setting the chunk_time_interval has now been added in the latest release of LTSS.

massive · March 12, 2021, 11:10am

Has anyone come up with a nice solution for keeping the database size in check when using ltss? Mine is consuming about 13 gigs now for ~3 months worth of data, which is not itself that much but at some point in future that’ll become an issue.

Obviously you could just purge old data, but then you lose if for good.

TimescaleDB has this concept of downsampling by continuous aggregates, which seems useful, but that requires some SQL level configuration. Probably perfectly doable, but I’m wondering if anyone has any experience on this approach? Additionally, when aggregating data things like individual record HA attributes become meaningless.

freol · March 12, 2021, 3:10pm

@massive
AFAIK, timescaledb ships with three features which can be utilized for handling these kind of situations:

automatic data retention policies, i.e. purging old data through pre-defined policies
continuous aggregates , as you have already mentioned, which has its pros and cons obviously
data compression which compress older data chunks, which keeps all your data but saves space at the expense of query performance

I havent had the need yet to deep-dive into these myself (my db does not grow at a rate which is cause for concern yet) but I would probably suggest to have a look at the compression alternative first since it keeps all your data. However, the compression feature is the newest one and is still partly under development. In the current docs it states:

WARNING:The current release of TimescaleDB supports the ability to query data in compressed chunks. However, it does not support inserts or updates into compressed chunks. We also block modifying the schema of hypertables with compressed chunks.

Once the last sentence in that warning is resolved (i.e. the ability to modify the schema of the hypertable containing compressed chunks) I have on my todo-list to integrate this as an option in LTSS. But for now, you will unfortunately have to go outside LTSS to set this up if you want to try it out.

pimw · June 7, 2021, 1:33pm

I’ve done this, and it is working great! I had the same problem. But for me, it was not the storage size, but the loading time of the dashboards that took too long.

I’ve made two materizalized views containing summarized data: ltts_hour and ltts_day. I use these two views to make most of the graphs in Grafana. The original ltts table is only used in Grafana to display detailed information about the last 24 hours.

This is how i did it:

edit: please only run this sql statement after you have created and sucessfully populated the materialized views (see below). Best wait for a couple of weeks until you are happy with all summarized data:

SELECT remove_retention_policy('ltss');
SELECT add_retention_policy('ltss', INTERVAL '30 days');

->ltts data older than 30 days get deleted

CREATE MATERIALIZED VIEW ltss_hour
WITH (timescaledb.continuous) AS
SELECT entity_id AS metric,
       time_bucket(INTERVAL '1 hour', time) AS bucket,
       MAX(cast(state as numeric)) as value
FROM ltss
WHERE entity_id in ('sensor.hourly_energy_offpeak', 'sensor.hourly_energy_peak', 'sensor.hourly_energy', 'sensor.hourly_gas', 
'sensor.daily_energy_offpeak', 'sensor.daily_energy_peak', 'sensor.daily_energy','sensor.daily_gas',
'sensor.weekly_energy_offpeak', 'sensor.weekly_energy_peak','sensor.weekly_energy', 'sensor.weekly_gas',
'sensor.monthly_energy_offpeak', 'sensor.monthly_energy_peak', 'sensor.monthly_energy','sensor.monthly_gas',
'sensor.yearly_energy_offpeak', 'sensor.yearly_energy_peak','sensor.yearly_energy',  'sensor.yearly_gas')
GROUP BY 1,2;

→ summarized data ltts_hour is created

SELECT add_continuous_aggregate_policy('ltss_hour',
    start_offset => INTERVAL '4 day',
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour');

->data will be summarized in the materialized view.

In my case, a “max” function whas what i was after. It is important to understand your data in order to make these kind of decisions. The start_offset should not be bigger than the retention policy of the underlying ltss table (in my case 30 days), otherwise the materialized view will update the older data with the new state (in other words: removing your precious data in the materialized view to reflect the removal of the data in the underlying ltss table).

Same for ltts_day:

CREATE MATERIALIZED VIEW ltss_day
WITH (timescaledb.continuous) AS
SELECT entity_id AS metric,
       time_bucket(INTERVAL '1 day', time) AS bucket,
       MAX(cast(state as numeric)) as value
FROM ltss
WHERE entity_id in ('sensor.daily_energy_offpeak', 'sensor.daily_energy_peak', 'sensor.daily_energy','sensor.daily_gas',
'sensor.weekly_energy_offpeak', 'sensor.weekly_energy_peak','sensor.weekly_energy', 'sensor.weekly_gas',
'sensor.monthly_energy_offpeak', 'sensor.monthly_energy_peak', 'sensor.monthly_energy','sensor.monthly_gas',
'sensor.yearly_energy_offpeak', 'sensor.yearly_energy_peak','sensor.yearly_energy',  'sensor.yearly_gas')
GROUP BY 1,2;

->creates the materialized view.

SELECT add_continuous_aggregate_policy('ltss_day',
    start_offset => INTERVAL '15 day',
    end_offset => INTERVAL '1 day',
    schedule_interval => INTERVAL '1 day');

->data will be summarized in the materialized view.

Other useful SQL statements:
SELECT * FROM timescaledb_information.continuous_aggregates;
SELECT * FROM timescaledb_information.jobs;
SELECT * FROM timescaledb_information.job_stats;
SELECT alter_job(1022, next_start => ‘2020-12-13 00:37:00.0+00’);

Another tip to reduce data: make sure to already pre-filter what should be included/excluded in the ltss table. You can do this in the configuration.yaml. See an example below. This will save you on the short-run, while the materialized views will help you on the long run.

ltss:
  db_url: xxx
  include:
      domains:
      - sensor
      entities:
      - person.xxxxxx

You can find more information about materialized views here:

pimw · June 7, 2021, 1:40pm

@freol: thank you for making this integration. I am using this for half an year now. It is incredible.
What i really like about timescaledb, is that it allows you to make dimensional slices based on time in Grafana (e.g.: slice per day, per week, per month, etc.). This makes it the perfect use-case for my energy dashboard. It runs stable and quick. Thank you for making this docker container available.

joostvh · September 4, 2021, 9:42am

Thank you for this great integration. I am using this on Hass core 2021.8 venv, PostgreSQL 13.4, timescaleDB 2.3.1, all manual installed. I made some modifications to the LTSS install script: remove Postgis and location, because location is for me not needed. I changed the Chunk size to 1 Week = 604800000000.

Compression wasn’t working at the beginning (add_compression_policy). I had to change one of the primary keys from ID to entity_id (see how compression is working in TimescaleDB for understanding).

In the current situation I have a system with DSMR, multiple T/H/P sensors, etc (70 entities), recorded in Postgresql with a purge interval of 8 days, size +/- 300MB. The LTSS database is increasing weekly 2-3MB after compression (before compression 60MB), I excluded some uninteresting entities for logging like weather forecast, etc.

Yagyu · December 13, 2021, 7:11pm

Hi there,

I also use LTSS and now have a problem.

I use Homematic to read my gas meter. Unfortunately, this had to be set up again 4 days ago because it crashed.

LTSS has not updated the values since then. I have integrated the sensor again exactly as before, in Hassio I see the correct values: in Grafana via the ltss query, but I get the old value from 4 days ago … Anyone with an idea where the problem is?

Thanks very much!

Yagyu · December 15, 2021, 7:21am

I just looked in pgadmin: the numbers arrive right there too …

but in Grafana I still get the old status ;_;

I do not understand what is happening there ;(

CM000n · December 15, 2021, 8:09am

You used different queries in pgadmin and Grafana. In pgadmin you select all values of the entity. In Grafana you perform an aggregation to the maximum value MAX(). And grafana aggregates them per day as it seems? Could this be the cause?

Yagyu · December 15, 2021, 8:16am

Thank you

I deleted the previous from the query: at least it no longer shows any wrong data …

I also noticed an error in the log:

but I don’t understand what the error is trying to tell me ;(

CM000n · December 15, 2021, 9:35am

Are you sure that all your values can be cast as float (double precision)? Is there maybe besides ‘unavailable’ another string value in your data for this entity?

Yagyu · December 15, 2021, 10:07am

WHERE
  $__timeFilter("time")
  AND entity_id = 'sensor.gas_daily'
  AND state != 'unavailable'
  And state != ''

fixed it …

CM000n · December 15, 2021, 10:39am

aaand to make it even simpler:

WHERE
  $__timeFilter("time")
  AND entity_id = 'sensor.gas_daily'
  AND state NOT IN ( 'unavailable', '' )