Modifying data in the database via script?

I’d like to write a script that will go through the database and clean up some data. I am using the DHT22 component, and sometimes the temperature randomly drops from 60 to 30 for a single data point. This messes up the y axis in the graphs, so I’d like to remove data points that are dramatically different from their neighbors. I could do this in SQL or python via the ORM. I’d just like to see an example of someone doing some database access first.

I’m hoping this is something Home Assistant supports and I don’t have to write an independent script that will modify the database outside of Home Assistant.

1 Like

Hi @swetzel, i don’t think HA can do this.
I’m running a bash script via cron to remove such values on mysql database.

#!/bin/bash

HOST='mysql_server'
DB='HA_DB'
USER='username'
PW='password'

query="DELETE FROM states where state like '%999%'"
echo "$query" | mysql -h$HOST -D$DB -u$USER -p$PW

query="DELETE FROM states where state like '%unknown%'"
echo "$query" | mysql -h$HOST -D$DB -u$USER -p$PW

Thanks @VDRainer. I made a Python script to do what I wanted, go through my temperature data and delete extreme outliers. I’ll post the code here as a reference in case anyone wants to do something similar. If you want to run this script as is, put it in .homeassistant/scripts/scrub_db.py. Change the scrub_table() calls at the bottom to the entity ids you want to scrub. It should ask before deleting any rows. Be careful though, I barely tested this, no promises it will work and won’t delete all your data.

For now I’m manually running as these errors are pretty rare. It would be easy enough to make it run without interaction and set it to run with cron. But I want to test it more before I do that.

from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy import create_engine

engine = create_engine('sqlite:///../home-assistant_v2.db', echo=False)
Base = automap_base()
Base.prepare(engine, reflect=True)

State = Base.classes.states
Event = Base.classes.events

def scrub_table(session, entity_id, offset = 10, min_rows = 100, padding = 5):
    res = session.query(State).filter_by(entity_id = entity_id)
    print("Entity {} has {} rows".format(entity_id, res.count()))
    if res.count() < min_rows:
        print("Not enough rows")
        return
    # The idea here is if you have thousands of data points and only one or two are off by a lot
    # An example of temperature data points: (30, 32, 62, 63, 64, ... 65, 66), where 30 and 32 are errors, and we want to delete them
    # We take the 10th lowest point, subtract 5, and call that the cut off.  For my data that works well, but you will want to think about your data.
    # There are better ways to do this with standard deviations or deltas between points, but this was simple and works well for me.
    low_cut_off = float(res.order_by(State.state)[offset].state) - padding
    high_cut_off = float(res.order_by(State.state)[-1 - offset].state) + padding
    print("Low cut off {}, high cut off {}".format(low_cut_off, high_cut_off))
    delete_count = 0
    for row in res.filter(State.state > high_cut_off):
        print("Too high: {}".format(row.state))
        delete_count += 1
    for row in res.filter(State.state < low_cut_off):
        print("Too low: {}".format(row.state))
        delete_count += 1
    if delete_count == 0: return
    ans = input("Do you want to delete these {} rows [y/N]? ".format(delete_count))
    if ans == 'y':
        res.filter(State.state > high_cut_off).delete(synchronize_session=False)
        res.filter(State.state < low_cut_off).delete(synchronize_session=False)
        session.commit()


if __name__ == "__main__":
    session = Session(engine)
    scrub_table(session, 'sensor.bedroom_temperature')
    scrub_table(session, 'sensor.bedroom_humidity')
    scrub_table(session, 'sensor.living_room_temperature')
    scrub_table(session, 'sensor.living_room_humidity')
    scrub_table(session, 'sensor.dht_sensor_humidity')
    scrub_table(session, 'sensor.dht_sensor_temperature')

Example running:

Entity sensor.bedroom_temperature has 2691 rows
Low cut off 56.3, high cut off 70.1
Too low: 39.4
Do you want to delete these 1 rows [y/N]? y
Entity sensor.bedroom_humidity has 2895 rows
Low cut off 24.0, high cut off 42.5
Entity sensor.living_room_temperature has 540 rows
Low cut off 55.8, high cut off 72.3
Entity sensor.living_room_humidity has 1286 rows
Low cut off 23.6, high cut off 39.5
Entity sensor.dht_sensor_humidity has 10297 rows
Low cut off 20.5, high cut off 41.5
Entity sensor.dht_sensor_temperature has 10078 rows
Low cut off 42.7, high cut off 72.3
2 Likes

Just cause it took me a while to get this write i thought i would post so i can find it later when i forget.
my bash script for removing 0.0 temperature/humidity entries from my sensors (which are all called “sensor.snX_temperature” “sensor.snX_humidity” )

#!/bin/bash

HOST=‘mysql_server’
DB=‘HA_DB’
USER=‘username’
PW=‘password’

query=“delete from states where state = ‘0.0’ and entity_id like ‘sensor.sn%’”
echo “$query” | mysql -h$HOST -D$DB -u$USER -p$PW

It feels like a stupid question but I don’t seem to understand the logic of this database handling.
I’m trying the python code from above.

from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy import create_engine
import matplotlib.pyplot as plt


# engine = create_engine('sqlite:///../home-assistant_v2.db', echo=False)
engine = create_engine('sqlite:///home-assistant_v21.db')
print(engine.table_names())

Base = automap_base()
Base.prepare(engine, reflect=True)

State = Base.classes.states
Event = Base.classes.events
session = Session(engine)
res=session.query(State).filter_by(entity_id = 'sensor.robot_power')

This runs fine, but I seem to fail to just get all the values from a sensor (both time and values) out of this query and put them in an array. It feels like I’m missing something obvious.

Any suggestions?

thanks in advance!