I use Influxdb as long term measurement storage. When I started using Influxdb, I did not take actions on choosing which measurements to store in Influxdb. I don’t want to use retention policy on the measurements of interest (i.e. temperatures and humidity of my house) but store the measurements indefinitely. However, I would like to get rid of non-interesting measurements, like battery level of sensors etc.
How do I delete specific values from database such way that the disk usage of Influxdb is reduced?
I bump this…
Even though I started with quite strict rules for what to save, data base size grow pretty fast.
It would be very nice with a GUI function to just drop entities that are not interested any more.
Another function that would be highly appreciated is a more customable data retention policy…via GUI for us beginners…
I read about someone who managed to create some data retention policies to drop excessive old data (but I never understood if the description was for a “normal” installation of Home Assistant on a Pi/NUC etc or if it was some container installation).
However, what he described was: Data from current to X month: Keep as is Data from X month to Y month: Save one data point per 5 min Data from Y month to eternity: Save one data point per 15 min
Something like this would be brilliant to be able to configure in an easy manner - preferably via the InfluxDB add-on GUI, or if someone could write a very clear description of how to configure.
Preferably with X and Y month configurable as well as number of data points per minute per interval.
First link about influxdb-downsampling was very useful for me. I have a lot of data and I want it grouped by 15 minutes, so queries to backfilling my data fails if I do it for more than one week at a time, and it last one minute per week. I have almost 3 years of records, so I created this AppDaemon App:
from influxdb import InfluxDBClient
class InfluxDBQueryAutomation(hass.Hass):
def initialize(self):
# InfluxDB connection details
self.host = "influxhost"
self.port = 8086
self.username = "username"
self.password = "password"
self.database = "homeassistant"
# Connect to InfluxDB
self.client = InfluxDBClient(host=self.host, port=self.port, username=self.username, password=self.password)
self.client.switch_database(self.database)
# Time range configuration
self.weeks_to_process = 150 # Number of 2-week periods to generate
self.start_weeks_ago = 150 # Starting point in weeks ago
self.interval_weeks = 1 # Interval in weeks
# Start the process to generate queries
self.run_queries()
def run_queries(self):
"""Generate and execute queries with a 1-second delay between them."""
for i in range(self.weeks_to_process):
start_week = self.start_weeks_ago - (i * self.interval_weeks)
end_week = start_week - self.interval_weeks
query = f"""
SELECT mean(*) INTO "homeassistant"."infinite".:MEASUREMENT
FROM "homeassistant"."autogen"./.*/
WHERE time > now() - {start_week}w AND time < now() - {end_week}w
GROUP BY time(15m), * FILL(previous)
"""
self.log(f"Executing query for weeks {start_week} to {end_week}")
try:
# Execute query and wait for the response
result = self.client.query(query)
# Check if query returned successfully
if result:
self.log(f"Query for weeks {start_week} to {end_week} executed successfully.")
else:
self.log(f"Query for weeks {start_week} to {end_week} returned no result or failed.")
except Exception as e:
self.log(f"Error executing query for weeks {start_week} to {end_week}: {e}")
# Delay of 1 second before the next query
time.sleep(1)
self.log("All queries executed.")
Just had to add influxdb Python package in AppDaemon Configuration.
I have discovered a downside of downsampling the data. InfluxDB is adding a prefix to the grouped field. In this case, all fields on infinity RP now starts with “mean_”, so in Grafana you cannot switch easily between retention policies (default/infinity).