Custom component: Long Time State Storage (LTSS) utilizing TimescaleDB

CM000n · October 9, 2022, 7:02am

So just Name the bucket column “time” instead of “bucket” and use “GROUP BY 1” instead.
Maybe you can execute the query directly with some DB client instead of Grafana? So that we get a better understanding of the structure of the data output.

SharkWipf · October 9, 2022, 3:23pm

That’s essentially my original query again, result is also identical to my original query.
I’ve run this query against psql directly, as well as my raw data query (without bucketing), the former shown below, the latter I’ve uploaded here because of its size: https://clbin.com/W1man

The raw query is just:

SELECT
  "time",
  state_numeric AS "Phase 2 raw"
FROM ltss
WHERE
  "time" BETWEEN '2022-10-08T08:00:05.341Z' AND '2022-10-08T12:07:35.001Z'
  AND entity_id = 'sensor.power_consumed_phase_2'
ORDER BY 1

freol · October 9, 2022, 5:32pm

@SharkWipf
Hmm, I think you should actually be able to get this done without the locf function. How about this:

SELECT
  time_bucket_gapfill('5m', "time") AS "time",
  COALESCE(
    AVG(state_numeric),
    0.0
  ) AS "Phase 2 5m"
FROM ltss
WHERE
  $__timeFilter("time")
  AND entity_id = 'sensor.power_consumed_phase_2'
GROUP BY time_bucket_gapfill('5m', "time")
ORDER BY 1

What I learnt from this exercise is that locf is carrying forward the value of the previous time bucket and not actually the value of the last known sensor value. Maybe its just me, but the naming is a bit misleading?

EDIT: You may also want to have a look at this. You are currently calculating the arithmetic mean for every time bucket which may be misleading if the samples are unevenly spaced.

SharkWipf · October 9, 2022, 7:40pm

Interestingly, in this version without LOCF and with COALESCE 0.0, the result becomes 1-to-1 identical to the earlier $__timeGroup example with LOCF. Not sure how that works, but unfortunately I still believe that data to not be correct. Here’s another shot of it compared to the raw data and the “awful” but accurate workaround (“freol 2” and “timeGroup” being identical):

Seems like time_weight() might indeed be the solution, unfortunately I do not have a way to test it.
The time weighted average functions are part of a separate Timescale Toolkit extension.
I got this setup running in Docker (well, Podman) on an anemic Pi2B, and unfortunately the current TimescaleDB container image that includes Timescale Toolkit does not support multiarch like the old, deprecated image it was supposed to replace, so I’m stuck on that image.
I’ve opened an issue about this back in May, but unfortunately it does not appear to be a priority for them at this time.

I might see if I can install it manually at some point, but everything on this Pi takes forever and I’m glad I even got this compose stack working, so I’ll probably just have to settle for my “awful” solution I guess.
Thanks for your help, if you have any other solutions I’ll be happy to try, if not, at least my awful workaround should do the job, hopefully without running my Pi out of memory.

EDIT: To clarify, I had changed the bucketing to 1m instead of 5m in this screenshot, for better clarity of the issue. Same issue applies to 5m and other intervals of course.

wicol · October 24, 2022, 11:08am

Yeah I tried playing with gapfill etc with my MQTT Eventstream-setup as well. It doesn’t really work when the last data point is outside of the query time window. It also doesn’t work well with time windows including future times - future energy pricing, prognosis data combined with data from sensors results in faked values until end of graph (maybe there’s functionality for this in TS, haven’t looked in to it because of scenario 1 won’t work anyways).

But OK, just reacting to events, like MQTT Eventstream. Follow up question then: I wonder if I can use a similar workaround with LTSS. What I did was to create an appdaemon app that uses the mqtt publish service to publish sensor values on an interval.
Is there some service or event I could trigger like that to make LTSS log a data point? Perhaps manually triggering a sensor update event could work? But a cleaner trigger would be nicer.

edouardkleinhans · October 27, 2022, 11:00am

Hello,

Does anybody have a generic downsampling job for ltss ?

I see a simple one exist for influxDB (InfluxDB 2 data retention - my approach), can we have the same approach for timescaleDB.

Thanks

Expaso · November 14, 2022, 8:31pm

Hey, my timescaledb addon for HA works for armv7, aarch64 and amd64.
Could try that one. You could start it as a regular docker image outside of HA.

Expaso · November 14, 2022, 8:37pm

Sure!

See: Pinned: Additional Documentation for Timescaledb Integration · Issue #1 · Expaso/hassos-addons · GitHub

This creates materialized downsampled views.

The underlying data can then be auto-removed from the ltss hypertable when it’s older then, say 6 months?

SELECT add_retention_policy('ltss', INTERVAL '6 months');

Create a retention policy | Timescale Docs

pimw · November 15, 2022, 8:59pm

Hi, i’m using container dekiesel/timescaledb-postgis:1.7.4-pg12 for aprox 2 years now.
I’d like to upgrade to timescale 2.8 + postgresql 14, using this guideline:
https://docs.timescale.com/timescaledb/latest/how-to-guides/upgrades/upgrade-pg/

This guideline tells me to first upgrade to timescale 2.8, and after that upgrade to postgresql 14.
However, i get stuck in the very first step: upgrading to timescale 2.8

ALTER EXTENSION timescaledb UPDATE;

It tells me that i cannot update, since the original docker container uses timescaledb 2.1.0-dev. Upgrading from dev is not supported.

Is there any migration path I can follow?

srkoster · November 16, 2022, 7:17pm

@freol I’m looking into compression and noticed the same thing as @joostvh , with the primary key on the ltss table being id, time compression can only be segmented by id and doesn’t add anything in terms of storage savings. When I alter the primary key to entity_id, time and segment by entity_id the savings are significant.

What’s your idea on the current primary key? And can it be altered without consequences?

pimw · November 24, 2022, 10:58pm

Hi, i would like to run two instances of ltss, each of them pointing to a different timescaledb instance. Reason: for some sensors i would like to differentiate in retention period.

Is this possible?

SharkWipf · December 14, 2022, 4:02pm

Sorry, missed your message until now. I have since managed to aqcuire a couple of Pi4s, so I think I’ll migrate the installed to HassOS in the near future, hopefully without giving up my data in the process. I’ll probably end up using your addon then.

Expaso · December 15, 2022, 2:16pm

You surely don’t need to give up your data Simply take a backup of the addon, and restore it on your new pi.

SharkWipf · January 29, 2023, 6:26pm

Coming back to this way later than I had meant to, but looking at your addon, it does not actually include Toolkit, does it?
I can run TimescaleDB perfectly fine in my current setup, the component that was missing was TimescaleDB Toolkit, which provides the functions I seemingly need, like time weighted average.
The officially suggested Timescale-HA container image that includes Toolkit does not have ARM support.

Trying to enable this extension in your addon results in an error, suggesting toolkit is not available:

ERROR: could not open extension control file "/usr/share/postgresql14/extension/timescaledb_toolkit.control": No such file or directory SQL state: 58P01

Expaso · January 30, 2023, 8:41am

That is correct.
TimescaleDB toolkit indeed provides some extra statistical functions that users would like to see.

I am in the process of integrating TimescaleDB toolkit into the addon, and the master-branch already contains the basics for this.

It is however quite an undertaking, because it (as you found out) needs to compile on multiple platforms. (I do accept pull requests ), and I pack a lot of extra’s.

Work on this has been stalled for a little while because I was abroad for my work, but I expect to finish it soon.

What is your specific usecase for your weighted avarage (just curious) ?

SharkWipf · January 30, 2023, 11:30am

Hmm alright, I’ll keep an eye on that then. In the meantime I guess I’ll see if I can get it to build myself, the instructions on the Toolkit page seem deceptively straightforward, though of course they don’t mention Aarch64…

As for my use case, really I’m just trying to visualize my sensor data accurately, and I’m at a loss how else to do it.
I can visualize my raw data correctly, but this is way too much data for both client and server, so I’m trying to use continuous aggregates to create lighter time averages, like 5 minute averages.
But because the data from my sensors is only recorded on state change, and states often don’t change for hours, something goes wrong with the averages and the returned data is incorrect.
The screenshot in my original post explains it better than I can in text.

Basically though, it seems to boil down to Timescale extending the value of the last 5 minute average with a time_bucket_gapfill when the state stops changing, rather than extending the last datapoint, which results in wildly incorrect data.
In theory, using time weighted averages should hopefully fix this, and if it doesn’t I don’t know what will.

So tl;dr: Without time weighted averages, my data in TimescaleDB is useless because it’s incorrect (averaged) or too heavy to use at the scale I want it to use (raw).

Expaso · January 30, 2023, 1:10pm

Ah! So!

As far as I know, the gapfilling in Timescale work by plugging in the fucntion you want to use to gapfill with.
See this doc: Timescale Documentation | Time bucket gapfill

Basically it states: “The time bucket gapfill function creates a contiguous set of time buckets but does not fill the rows with data. You can create data for the new rows using another function, such as last observation carried forward (LOCF), or interpolation.”

So, if you just want to extend the last datapoint, can you not use the “Last Obervation Carried Forward” method?
Timescale Docs

SharkWipf · January 31, 2023, 2:54pm

Unfortunately, no dice. I tried recreating the same graph as in my original post on the same timeframe, and tried using the last() function with LOCF this time:

SELECT
  time_bucket_gapfill('5m', "time") AS "time",
  locf(
    last(state_numeric, time)
  ) AS "Phase 2 last"
FROM ltss
WHERE
  $__timeFilter("time")
  AND entity_id = 'sensor.power_consumed_phase_2'
GROUP BY time_bucket_gapfill('5m', "time")
ORDER BY 1

The result is a lot of data getting lost entirely:

Expaso · February 7, 2023, 9:19pm

Breaking my head over this one for a while, but indeed you are totally right.
The uneven sample-rate skews the simple averages.

You definitely need the time-weighted averages for this one to be accurate.

Failing to come up with a better query from my part, I resumed work on updating the addon to support the timescaledb-toolkit.
Thanks to your analysis, I think I need this one too for my own graphs

I’ll do my best in the scares free time I have!

luisfmrosa · March 4, 2023, 8:33pm

Hello guys,

After upgrading to core 2023.3.1, I’m having this error with LTSS:

2023-03-04 11:21:43.885 ERROR (MainThread) [homeassistant.setup] Setup failed for custom integration ltss: Requirements for ltss not found: ['sqlalchemy>=1.0,<2.0'].

In the log, I also have this:

`Logger: homeassistant.util.package
Source: util/package.py:107
First occurred: 21:33:51 (9 occurrences)
Last logged: 21:36:12

Unable to install package sqlalchemy>=1.0,<2.0: ERROR: Cannot install sqlalchemy<2.0 and >=1.0 because these package versions have conflicting dependencies. ERROR: ResolutionImpossible: for help visit Dependency Resolution - pip documentation v23.1.dev0 [notice] A new release of pip is available: 23.0 → 23.0.1 [notice] To update, run: pip install --upgrade pip`

Anybody else is having the same issue?

Thank you!

Luis

[UPDATED] : Just found a way to solve this, here is what I did, just in case someone else falls into something similar:

I upgraded LTSS to the latest version. But I was still unable to restart, as the error above was blocking the restart…
I commented my LTSS configurations in configuration.yaml: now I can restart!
After restart, I uncommented my LTSS settings in configuration.yaml: validating configuration now does not accuse any error. I can restart again …
After restart, everything works like a charm.