Fixing Data Spikes in InfluxDB
<tl;dr>
With this script, you can:
Delete or update incorrect data (value field) in InfluxDB (v1 only)
Target specific sensors/entities
Define a threshold to find and remove outliers
Filter by time range
</tl;dr>
I absolutely love my dashboards in Grafana, built on top of InfluxDB, which is fed by Home Assistant. The combination works great—until one of your sensors goes crazy and completely ruins your beautiful graphs in an instant.
In Home Assistant, we at least have the option to remove incorrect values from statistics, but in InfluxDB, it’s a real pain to clean up bad data once it’s stored.
** The Problem**
A few days ago, my smart meter went haywire and started logging insane power consumption spikes—reporting values way above 50,000 kWh multiple times a day.
Of course, I could filter the sensor in Home Assistant or Grafana, but my long-term data was already corrupted.
I searched for a simple way to modify or remove incorrect data in InfluxDB, but I couldn’t find anything that fit my needs.
All I wanted was a way to get rid of those spikes—without manually running complicated database queries.
The Solution
So, I decided to write a Python script to make this process easy and painless!
Since this might be useful for others as well, I’m sharing it with the community!
With this script, you can:
Delete or update incorrect data in InfluxDB (v1 only)
Target specific sensors/entities
Define a threshold to find and remove outliers
Filter by time range (optional)
** Example Output**
================================================
Welcome to the InfluxDB Cleanup Script!
================================================
Influx Host: 192.168.1.100:8086
Influx DB: home_assistant
Entity_ID: sensor.energy_meter_1
Measurement: W
Compare Operator: >
Threshold: 4200000
Mode: delete
================================================
👉 Search database for matching entries? (Y/N): y
🔍 Searching the database... ⠹
✅ Database query completed.
🔍 Found entries with value > 4200000: 25
┌───────────────────────────────┬──────────────┐
│ Timestamp │ Value │
├───────────────────────────────┼──────────────┤
│ 2024-02-17T10:30:45.123456Z │ 1654321 │
│ 2024-02-17T10:30:50.654321Z │ 1902345 │
│ 2024-02-17T10:31:05.432167Z │ 1704567 │
│ 2024-02-17T10:31:20.321789Z │ 1756789 │
│ 2024-02-17T10:31:35.987654Z │ 1805678 |
| ... │ |
└───────────────────────────────┴──────────────┘
... and 5 more entries.
👉 Do you want to delete these values? (Y/N): y
🚀 Starting deletion...
[███████████------] 60.0% (15/25)
✅ All values were successfully deleted.
If you’ve ever struggled with sensor spikes ruining your long-term InfluxDB data, this script might save you a lot of frustration!
You can find the script and additional Info on how to use it in my github: https://github.com/fluffykraken/sics
Although it works in my setup, please be super careful using it, as it is irreversible once data has been deleted or updated. The script ask for confirmation, before altering data, but we all know that hitting ‘y’ sometimes happens too fast. So you better have a backup of your database and / or test with a non productive DB first.
Have fun, stay happy, and don’t fear the spikes no more…