Fill influxdb on regular interval

twdkeule · August 3, 2020, 9:45am

I am trying to fill influxdb on a regular interval to avoid problems with downsampling and Grafana/Chronograf. Using a Continuous Query with fill(previous) doesn’t work if there is no data in the query range.

@RobBie1221 proposed on Github to use a Kapacitor script.

Related to

RobBie1221 · August 3, 2020, 11:16am

I’m using 2 databases in InfluxDB, one for my homeassistant data which is filled at irregular intervals (every time state is updated in Home Assistant it sends out data) and one which is filled at regular intervals (in my case every 5 minutes a data point). The second database is filled from the first and if no data is received in a window, it fills the point with previous data.

I used a TICK script to do that:

// Run a batch query
var data = batch
// Only "last" is needed, but the "max" is also queried to get a timestamp which is aligned with the
// start date/time of the query
    |query('SELECT last("value") as value, max("value") as max FROM "homeassistant"."autogen"."kWh"')
        .period(30d) // Period of 30 days, when there is at least one point in this window, it will fill the gaps
        .offset(5m) // Run with 5 minutes offset to deal with the fact that home assistant does not send out data immediatly
        .every(5m) // Run every 5 minutes
        .align() // Align the timestamps to run on even clock counts
        .groupBy('entity_id') // Group the data by entity_id
    |shift(30d) // The timestamp of the datapoint is now()-5m-30d, so it needs to be shifted 30d

data
    // Drop the max key
    |eval(lambda: "value")
        .as('value')
    // Write back to InfluxDB (could be the same db, or another one)
    |influxDBOut()
        .database('my_downsampled_db')
        .retentionPolicy('autogen')
        .measurement('kWh')

Try it out, if you need some more info or help, let me know.

twdkeule · August 3, 2020, 11:53am

I hadn’t thought about looking back 30 days on a regular interval. I think it’s possible to do this with a continuous query. So I don’t have to install Kapacitor:

CREATE CONTINUOUS QUERY <cq_name> ON <database_name>
RESAMPLE EVERY 5m FOR 30d  # run every 5 min, look back 30 days 
BEGIN
  # select last of every 5 minute interval, take previous if empty
  SELECT last('value') as 'value' GROUP BY time(5m), * fill(previous) 
END

It doesn’t entirely fix the issue when the state is constant for more than 30 days, but that probably doesn’t happen very often.

RobBie1221 · August 3, 2020, 11:56am

That could work as well. I also have TICK running for “deadman” detection on data (alert if no data is received on an entity for x time). That basically gives me 30 days to fix it when something does not report

Eventually, you could basically do anything you want with TICK. I also installed Python inside Kapacitor and you can use Python scripting to do anything you want with data.

twdkeule · August 3, 2020, 1:03pm

There is still a problem. The generated database loses al information inside the 5min interval.
What I want is to retain all information from HA and have the last value repeat every 5 min.

I wonder if it is possible with Kapacitor to merge them afterwards with UnionNode?
Or maybe use 2 Continuous Queries to write both the last() value and the original data into the same database.

twdkeule · August 3, 2020, 1:12pm

The last option seems to work. When the plug is activated there are a lot of datapoints (± every 1s) and afterwards only once every 5 seconds.

Both CQ’s:

CREATE CONTINUOUS QUERY last_elec_30d ON homeassistant RESAMPLE EVERY 5m FOR 30d BEGIN SELECT last(value) AS value INTO homeassistant.short_term.test_elec FROM homeassistant.short_term.electricity GROUP BY time(5m), * fill(previous) END

CREATE CONTINUOUS QUERY mean_elec_1s ON homeassistant BEGIN SELECT mean(value) AS value INTO homeassistant.short_term.test_elec FROM homeassistant.short_term.electricity GROUP BY time(1s), * END