"Manual" MariaDB SQL Purge with protected entites

Tags: #<Tag:0x00007fc40f1c1288>

Hello all,

I’m trying to do a couple things

  • keep state history for a small set of entities for 1 year
  • purge everything else in ~2 weeks (but I DO want to keep 2 weeks for (almost) everything as I want to be able to troubleshoot automations etc.)
  • keep my DB size from getting out of control (2 weeks of data is ~1GB, don’t want that to go to ~25…)

I am using MariaDB on a different server, I’ve excluded a number of entities from being recorder altogether:

recorder:
  purge_keep_days: 365
  db_url: !secret db_url
  commit_interval: 20
  exclude:
    domains:
      - updater
      - media_player
      - scene
      - zone
      - weather
      - camera

    entity_globs:
      - binary_sensor.*_tamper
      - binary_sensor.*_battery_low
      - binary_sensor.*_update_available
      - binary_sensor.w10*
      - sensor.w10*
      - sensor.*_battery
      - sensor.*_linkquality
      - sensor.pi_hole*
      - sensor.samsung_*
      - sensor.memory_*
      - sensor.*_raw
      - sensor.nas_*
      - sensor.*_update_state

To define the entities I want to preserve I have created a new table called “protected_entities” that contains only an entity_id column:

DESCRIBE protected_entities;
entity_id	varchar(255)	NO	PRI	    NULL

SELECT * FROM `protected_entities` LIMIT 1;
sensor.external_temperature	

Now my very rudimentary approach to this so far has been to use cron to simply delete everything from states that’s older than 14 days and is not included in protected_entities (MariaDB syntax):

DELETE FROM states WHERE entity_id NOT IN (SELECT entity_id FROM protected_entities) AND last_updated < (NOW() - INTERVAL 14 DAY);

For events I would do the same, delete events older than 14 days excluding those related to protected entities:

DELETE FROM events LEFT JOIN states on events.event_id = states.event_id WHERE states.entity_id NOT IN (select entity_id from protected_entities) AND events.time_fired < (NOW() - INTERVAL 14 DAY);

And then optimize the tables.

Now… I understand that I’m not exactly being nice to the DB schema since I don’t care about old_state_id and don’t “clean up” after myself. I’ve tried to translate the source code for purge.py into SQL statements but that’s where my very basic SQL skills very quickly end:

Question is then - am I setting myself up for a disaster in the long term or is this heavy handed but ultimately not destructive?

Obviously the dream here would be to have this feature built directly into the recorder so that you can keep two “levels” of history…

Thanks!

Probably need to use influxdb for the entities you want to keep and let auto-purge keep mariadb under control. You will eventually end up with db corruption necessitating deleting the database with all data lost anyway at some point…

“Corruption” in a sense it will stop working (in that case why?) or in the sense of “your data integrity isn’t 100%”… because to be honest I can live with the latter and I don’t exactly see why the former would be happening. But again, I am happy to be proven wrong.

I agree with @Plawa seems like you’re playing with :fire:

As a newbie I thought I’d investigate… so running sqlite we have the following indexes

sqlite> select type, name, tbl_name, sql
FROM sqlite_master
WHERE type='index';

index|ix_events_time_fired|events|CREATE INDEX ix_events_time_fired ON events (time_fired)
index|ix_events_context_parent_id|events|CREATE INDEX ix_events_context_parent_id ON events (context_parent_id)
index|ix_events_context_id|events|CREATE INDEX ix_events_context_id ON events (context_id)
index|ix_events_event_type_time_fired|events|CREATE INDEX ix_events_event_type_time_fired ON events (event_type, time_fired)
index|ix_events_context_user_id|events|CREATE INDEX ix_events_context_user_id ON events (context_user_id)
index|ix_recorder_runs_start_end|recorder_runs|CREATE INDEX ix_recorder_runs_start_end ON recorder_runs (start, "end")
index|ix_states_old_state_id|states|CREATE INDEX ix_states_old_state_id ON states (old_state_id)
index|ix_states_event_id|states|CREATE INDEX ix_states_event_id ON states (event_id)
index|ix_states_last_updated|states|CREATE INDEX ix_states_last_updated ON states (last_updated)
index|ix_states_entity_id_last_updated|states|CREATE INDEX ix_states_entity_id_last_updated ON states (entity_id, last_updated)

Presumably these indexes are also present in mariadb ?

My concern is what happens when you’ve set your commit_interval to 20 and perform a ‘manual’ delete in that period ! :bomb:

could stop working.
Sometimes writes to the database can cause a problem when HA restarts.
Sometimes restoring a snapshot will result in a corrupt database.
Muchmore prone to happen with SQL lite than mariadb but I recently did have an issue with mariadb - nothing was being committed so I did have to remove the addon and add it back hence getting a fresh database.
As I said if yu have data you care about keeping for an extended period then use influxdb

There are at least two feature requests, that you should vote for:
Recorder Retention Period By Entity - Feature Requests - Home Assistant Community (home-assistant.io)
and
Support for multiple recorder components - Feature Requests - Home Assistant Community (home-assistant.io)

1 Like