"Manual" MariaDB SQL Purge with protected entites

Plawa · March 28, 2021, 8:50am

Hello all,

I’m trying to do a couple things

keep state history for a small set of entities for 1 year
purge everything else in ~2 weeks (but I DO want to keep 2 weeks for (almost) everything as I want to be able to troubleshoot automations etc.)
keep my DB size from getting out of control (2 weeks of data is ~1GB, don’t want that to go to ~25…)

I am using MariaDB on a different server, I’ve excluded a number of entities from being recorder altogether:

recorder:
  purge_keep_days: 365
  db_url: !secret db_url
  commit_interval: 20
  exclude:
    domains:
      - updater
      - media_player
      - scene
      - zone
      - weather
      - camera

    entity_globs:
      - binary_sensor.*_tamper
      - binary_sensor.*_battery_low
      - binary_sensor.*_update_available
      - binary_sensor.w10*
      - sensor.w10*
      - sensor.*_battery
      - sensor.*_linkquality
      - sensor.pi_hole*
      - sensor.samsung_*
      - sensor.memory_*
      - sensor.*_raw
      - sensor.nas_*
      - sensor.*_update_state

To define the entities I want to preserve I have created a new table called “protected_entities” that contains only an entity_id column:

DESCRIBE protected_entities;
entity_id	varchar(255)	NO	PRI	    NULL

SELECT * FROM `protected_entities` LIMIT 1;
sensor.external_temperature

Now my very rudimentary approach to this so far has been to use cron to simply delete everything from states that’s older than 14 days and is not included in protected_entities (MariaDB syntax):

DELETE FROM states WHERE entity_id NOT IN (SELECT entity_id FROM protected_entities) AND last_updated < (NOW() - INTERVAL 14 DAY);

For events I would do the same, delete events older than 14 days excluding those related to protected entities:

DELETE FROM events LEFT JOIN states on events.event_id = states.event_id WHERE states.entity_id NOT IN (select entity_id from protected_entities) AND events.time_fired < (NOW() - INTERVAL 14 DAY);

And then optimize the tables.

Now… I understand that I’m not exactly being nice to the DB schema since I don’t care about old_state_id and don’t “clean up” after myself. I’ve tried to translate the source code for purge.py into SQL statements but that’s where my very basic SQL skills very quickly end:

github.com

home-assistant/core/blob/ce3e00d018dd57648c40750712dbad81c572bf42/homeassistant/components/recorder/purge.py

"""Purge old data helper."""
from __future__ import annotations

from datetime import datetime, timedelta
import logging
import time
from typing import TYPE_CHECKING

from sqlalchemy.exc import OperationalError, SQLAlchemyError
from sqlalchemy.orm.session import Session
from sqlalchemy.sql.expression import distinct

import homeassistant.util.dt as dt_util

from .const import MAX_ROWS_TO_PURGE
from .models import Events, RecorderRuns, States
from .repack import repack_database
from .util import session_scope

if TYPE_CHECKING:

This file has been truncated. show original

Question is then - am I setting myself up for a disaster in the long term or is this heavy handed but ultimately not destructive?

Obviously the dream here would be to have this feature built directly into the recorder so that you can keep two “levels” of history…

Thanks!

DavidFW1960 · March 28, 2021, 9:02am

Probably need to use influxdb for the entities you want to keep and let auto-purge keep mariadb under control. You will eventually end up with db corruption necessitating deleting the database with all data lost anyway at some point…

Plawa · March 28, 2021, 12:25pm

“Corruption” in a sense it will stop working (in that case why?) or in the sense of “your data integrity isn’t 100%”… because to be honest I can live with the latter and I don’t exactly see why the former would be happening. But again, I am happy to be proven wrong.

_dev_null · March 28, 2021, 12:47pm

I agree with @Plawa seems like you’re playing with

As a newbie I thought I’d investigate… so running sqlite we have the following indexes

sqlite> select type, name, tbl_name, sql
FROM sqlite_master
WHERE type='index';

index|ix_events_time_fired|events|CREATE INDEX ix_events_time_fired ON events (time_fired)
index|ix_events_context_parent_id|events|CREATE INDEX ix_events_context_parent_id ON events (context_parent_id)
index|ix_events_context_id|events|CREATE INDEX ix_events_context_id ON events (context_id)
index|ix_events_event_type_time_fired|events|CREATE INDEX ix_events_event_type_time_fired ON events (event_type, time_fired)
index|ix_events_context_user_id|events|CREATE INDEX ix_events_context_user_id ON events (context_user_id)
index|ix_recorder_runs_start_end|recorder_runs|CREATE INDEX ix_recorder_runs_start_end ON recorder_runs (start, "end")
index|ix_states_old_state_id|states|CREATE INDEX ix_states_old_state_id ON states (old_state_id)
index|ix_states_event_id|states|CREATE INDEX ix_states_event_id ON states (event_id)
index|ix_states_last_updated|states|CREATE INDEX ix_states_last_updated ON states (last_updated)
index|ix_states_entity_id_last_updated|states|CREATE INDEX ix_states_entity_id_last_updated ON states (entity_id, last_updated)

Presumably these indexes are also present in mariadb ?

My concern is what happens when you’ve set your commit_interval to 20 and perform a ‘manual’ delete in that period !

DavidFW1960 · March 28, 2021, 9:14pm

could stop working.
Sometimes writes to the database can cause a problem when HA restarts.
Sometimes restoring a snapshot will result in a corrupt database.
Muchmore prone to happen with SQL lite than mariadb but I recently did have an issue with mariadb - nothing was being committed so I did have to remove the addon and add it back hence getting a fresh database.
As I said if yu have data you care about keeping for an extended period then use influxdb

mamoel · April 14, 2021, 1:44pm

There are at least two feature requests, that you should vote for:
Recorder Retention Period By Entity - Feature Requests - Home Assistant Community (home-assistant.io)
and
Support for multiple recorder components - Feature Requests - Home Assistant Community (home-assistant.io)

Thomas_Geens · February 13, 2023, 10:46pm

The purge process works in batch mode and results in possibly several hundreds of UPDATE, SELECT and DELETE queries running in the background against your logger database.

To follow up on the purge process in the background you can enable debugging for it on the logger service or use your database’s process monitor.

Debug the purge process using the logger service

configuration.yaml:

# Configure Logger service
logger:
  default: warning # avoids overflooding the system log
  logs:
    homeassistant.components.recorder.purge: debug # enable debug level for the purge process

Configuring the above will result in the purge component’s debug messages appearing in the general system logs.

MySQL logger database monitoring using the phpMyAdmin add-on: *

Open your phpMyAdmin add-on from the sidebar
Navigate to Status > Processes
Monitor activities with UPDATE, DELETE and SELECT statements on the tables e.g.,
– states
– state_attributes
– events

* This method will not work of course when your Logger service still uses the SQLite database engine which ships by default with your HomeAssistant installation.