InfluxDB: Removing or deleting data

I struggle a lot with deleting a single wrong reading from my InfluxDB. Can anyone help me out with all correct references (i am a noob in queries)?

Query format needed:
USE “energy”; DELETE FROM “Solar” WHERE (“Calculated” = “sajoostvermogen”) AND time > “2024-04-11 14:24:00” AND time < “2024-04-11 14:28:00”

Database format:

How do I get to the right location with this query?

I get an error message with the delete query.
“error parsing query : retention policy not supported at line 1, char 1”

Unable to delete!
I have this (running) query:

SELECT "value" FROM "hass-gar-beb"."autogen"."kWh" WHERE "entity_id"='gar_inverter_produzione_giornaliera' AND "value"> 70

And have tried this one:
USE "hass-gar-beb"."autogen"; delete FROM "kWh" WHERE "entity_id"="gar_inverter_produzione_giornaliera" AND "value">70

and this:
USE "hass-gar-beb"; delete FROM "kWh" WHERE "entity_id"="gar_inverter_produzione_giornaliera" AND "value">70

With or without quotes on “70”, my query is correct, but returned no results… (4 tries)

So, i’ve written down the time windows, selected it into the Dashboard time picker, and:

USE "hass-gar-beb"."autogen"; delete FROM "kWh" WHERE time > :dashboardTime: AND time < :upperDashboardTime: AND "entity_id"="gar_inverter_produzione_giornaliera"

…with or without the .autogen, it always tells me the shyntax is correct, but returned no results…

what am i missing?

Edit: Naturally the data is still there, so no “Only warning message”

We’ve been through this above.

Did it delete the the ones you wanted?

of course i relaunch the select query after every try, they’re still there!

I had issues using the above commands, after more research and a slight reformat the below command worked for me. ( you will need to change the database name, the measurement, the entity_id and date )

USE “homeassistant”; DELETE FROM “h” WHERE “entity_id” = ‘heating_runtime’ AND time > ‘2024-06-03’

All data after and including the date specified is deleted - if the current date does not work, try the day before

Refresh influx / Grafana to see the results

1 Like

InfluxDB v2

Always do a backup before deleting data… Right… :wink:* You have been warned. Now: Continue one your own. I’ve tested the query, but nothing more.

For people with InfluxDB v2.x and newer, here’s the Flux query I’m using in the Data Explorer:

import "influxdata/influxdb/schema"

schema.tagValues(
  bucket: "homeassistant",
  tag: "entity_id",
)

If you’d like to confirm that you can query data, try and run:

from(bucket: "homeassistant")
  |> range(start: -20y, stop: now())  // Last 5 years
  |> filter(fn: (r) => r["entity_id"] == "0x90fd9ffffe0d1286_linkquality")

Here’s another one:

from(bucket: "homeassistant")
  |> range(start: 2023-06-01T00:00:00Z, stop: 2024-02-01T00:00:00Z)
  |> filter(fn: (r) => r["entity_id"] == "sm_s906b_battery_level" and r["_field"] == "value" and r["_measurement"] == "%" and r["domain"] == "sensor" and r["_value"] <= 20.0)

You can also add more filters, like:

from(bucket: "homeassistant")
  |> range(start: -5y, stop: now())  // Last 5 years
  |> filter(fn: (r) => r["_measurement"] == "lqi")
  |> filter(fn: (r) => r["entity_id"] == "0x90fd9ffffe0d1286_linkquality")

Here’s a way to find the first timestamps from a given entity:

from(bucket: "homeassistant")
  |> range(start: 0, stop: now())  // Change the start time if necessary
  |> filter(fn: (r) => r["entity_id"] == "0x90fd9ffffe0d1286_linkquality")
  |> first()

But with InfluxDB v2.x you’d need to run the delete request through the API or in the CLI.

Deleting in InfluxDB v2 (Docker container, not a HASS.io solution)

Here’s one example where I’ve jumped into my InfluxDB container with docker exec -it influxdb /bin/bash.

First I exported my API key, which I’ve made in the InfluxDB Web UI.

Do that like this:

export INFLUX_TOKEN=ADD_TOKEN_HERE

Next get the org ID with:

influx org list

And now run the command:

influx delete \
  --org "YOUR_ORG_ID" \
  --bucket homeassistant \
  --start '1970-01-01T00:00:00Z' \
  --stop '2022-10-22T00:00:00Z' \
  --predicate '_measurement="lqi" AND entity_id="0x90fd9ffffe0d1286_linkquality"'

Remember to adjust both start and stop.

Result before delete:

Result after delete:

And I know… I need to update those exclude keys, cause… What should I ever use a link quality for… :laughing:

I’d like to point out that regex can be used, too:

from(bucket: "homeassistant")
  |> range(start: -1y, stop: now())  // Last 1 years
  |> filter(fn: (r) => r["_measurement"] == "lqi")
  |> filter(fn: (r) => r["entity_id"] =~ /_linkquality$/)

You cannot use the regex in the delete statement, but it’s possible to extract all entity ID’s like this:

from(bucket: "homeassistant")
  |> range(start: 0, stop: now()) // All time
  |> filter(fn: (r) => r["_measurement"] == "lqi")
  |> filter(fn: (r) => r["entity_id"] =~ /_linkquality$/)
  |> distinct(column: "entity_id")

And hereafter, continue deleting them one by one, creating a shell-script, a Python script, or whatever else you’d like to do :slight_smile:.

But, don’t listed to me. I’d always recommend to take a look at the docs :D!

Updating the Data Retention Policy

Again, you’ll find the information in the InfluxDB docs too, but here’s the steps.

First secure that you’re authenticated.

Next, render your buckets inside your organization: influx bucket list --org-id SET_ORG_ID_HERE

Now set the new Data Retention rule with influx bucket update --retention 730d0h0m0s --id SET_BUCKET_ID_HERE

Personally I had no restruction on Home Assistant entries, but now it’s capped to two years. You’ll notice that InfluxDB at some point will start cleaning up the data. I’m not sure 100% sure on how that’s scheduled, but you can always check the latest logs with docker logs -f influxdb --tail 100.

Happy Housekeeping everyone :broom: !

1 Like

I’ve come to the point where my influxdb is 50Gb…

I need to delete unwanted data.

I configured it to no even get ended up in influx but using the correct “exclude” rules in configuration.yaml.

As an example i have many “update.*” entities that I would like to ditch from the DB.

I have influxdb studio and can send queries but I am a little unsure about the correct syntax. Would this be something like this:

USE "home_assistant"; DELETE "entity_id" = 'update.*'

I get all entries with:

SHOW MEASUREMENTS ON "home_assistant"

I find myself in a similar situation - has anyone found a decent way of finding large measurements (or where there are lots of measurements)?