SQLite - Database Reconstruction - recorder purge not working

kc_au · August 7, 2024, 12:12am

After 2024.7.x I noticed every morning around 4am my recorder stop recording. I also noticed my database start bloating out to 13GB. I eventually ran out of disk space, and needed to extend the disk. This was not a sustainable solution.

I have spent weeks researching, and eventually I found the issue. Initially I suspected the 2024.7 release had an issue with recorded, but I believe the issue actually stems much further back than that. There was a big SQL change around 6 months prior, and I believe that process failed. Then in 2024.7 an enhancement was made to SQL, and the 2 issues together have destroyed any ability for recorded or normal delete/vaccum commands to work.

When trying to manually clean up the database (manually purge and vacuum) I was having key errors, which made no sense.

I started to suspect I had massive data issues, and I was going to be rolling back months to an old update. Pretty much with nothing to lose, I started getting quite aggressive with the DB clean up.

One thing I did do, is fire up a fresh and current install of HA, as mine is 3+ years old. Immediately, I noticed foreign keys in the Event and State tables, that were not present in my production DB. Also a bunch of index’s missing.

Another tell tale of a failed SQL update in the past, is multiple State_Temp tables, empty. So I believe in the past, HA has attempted to rebuild tables, ran out of disk space and bombed out. Then as the tables were as expected, the 2024.7 update has now introduced a failure.

How I fixed it:
0. Rebooted Production HA after allocated more CPU and RAM (from 1 CPU and 2Gb Ram to 4 CPU and 12GB Ram). I noticed that it exceeded 6gb RAM usage.
0.5 You will need plenty of disk space. 150-200% of your current DB size in spare space. May need to add another data disk and perform a data disk move.

Got a fresh copy of HA
Grabbed the SQL Table structure for all tables, and compared against my current install. (use SQLite Web)
Identified that Event and State tables both missing foreign keys, index’s. Also, a few columns have been removed from the new DB, that were still in my production.
Stopped Recorder.
Disabled Foreign Keys in DB
Dropped the existing Index’s on State and Entity (helps with speed of the next few steps)
Renamed tables States to States_old and entity to entity_old
created tables again from the CREATE template given in brand new HA SQLite
8.5 removed unused columns (Event had Created and states had domain and created, all 3 columns removed)
inserted states_old to States and same with entity. This took 4 hours.
recreated index from new HA SQL, one at a time. Each index took 10-30 mins.
Ran recorder.purge (this took 6 hours) with repack.
Recorder ran!!! and reduced by DB from 14gb to 2Gb.
Re-enabled Foreign Keys
reboot.

Good luck, this took me 3-4 weeks of research and about 3 days to repair once I had worked it out.

kc_au · August 7, 2024, 12:21am

STATES:

PRAGMA foreign_keys=off;
 *** DELETE INDEX’s USING SQLLITE ****

ALTER TABLE states RENAME TO states_old;

CREATE TABLE states  (
  state_id INTEGER NOT NULL,
  entity_id CHAR(0),
  state VARCHAR(255),
  attributes CHAR(0),
  event_id SMALLINT,
  last_changed CHAR(0),
  last_changed_ts FLOAT,
  last_reported_ts FLOAT,
  last_updated CHAR(0),
  last_updated_ts FLOAT,
  old_state_id INTEGER,
  attributes_id INTEGER,
  context_id CHAR(0),
  context_user_id CHAR(0),
  context_parent_id CHAR(0),
  origin_idx SMALLINT,
  context_id_bin BLOB,
  context_user_id_bin BLOB,
  context_parent_id_bin BLOB,
  metadata_id INTEGER,
  PRIMARY KEY (state_id),
  FOREIGN KEY(old_state_id) REFERENCES states (state_id),
  FOREIGN KEY(attributes_id) REFERENCES state_attributes (attributes_id),
  FOREIGN KEY(metadata_id) REFERENCES states_meta (metadata_id)
)

ALTER TABLE states_old DROP COLUMN DOMAIN ALTER TABLE states_old DROP COLUMN CREATED

INSERT INTO states_old SELECT * FROM states;

** RECREATE INDEX’s *** 
CREATE INDEX ix_states_attributes_id
ON states (attributes_id)

CREATE INDEX ix_states_context_id_bin
ON states (context_id_bin)

CREATE INDEX ix_states_last_updated_ts
ON states (last_updated_ts)

CREATE INDEX ix_states_metadata_id_last_updated_ts
ON states (metadata_id, last_updated_ts)


CREATE INDEX ix_states_old_state_id
ON states (old_state_id)


PRAGMA foreign_keys=on;

kc_au · August 7, 2024, 12:23am

EVENTS:


PRAGMA foreign_keys=off;

*** DELETE INDEX’s USING SQLLITE ****

ALTER TABLE events RENAME TO events_old;

CREATE TABLE events  (
  event_id INTEGER NOT NULL,
  event_type CHAR(0),
  event_data CHAR(0),
  origin CHAR(0),
  origin_idx SMALLINT,
  time_fired CHAR(0),
  time_fired_ts FLOAT,
  context_id CHAR(0),
  context_user_id CHAR(0),
  context_parent_id CHAR(0),
  data_id INTEGER,
  context_id_bin BLOB,
  context_user_id_bin BLOB,
  context_parent_id_bin BLOB,
  event_type_id INTEGER,
  PRIMARY KEY (event_id),
  FOREIGN KEY(data_id) REFERENCES event_data (data_id),
  FOREIGN KEY(event_type_id) REFERENCES event_types (event_type_id)
);

ALTER TABLE events_old DROP COLUMN CREATED

INSERT INTO events SELECT * FROM events_old;

CREATE INDEX ix_events_context_id_bin
ON events (context_id_bin)

CREATE INDEX ix_events_data_id
ON events (data_id)

CREATE INDEX ix_events_event_type_id_time_fired_ts
ON events (event_type_id, time_fired_ts)

CREATE INDEX ix_events_time_fired_ts
ON events (time_fired_ts)


PRAGMA foreign_keys=on;

CaptTom · August 7, 2024, 12:29pm

Nice work! It’s always good when you can save data which would otherwise be lost.

Personally, I’d have just deleted the DB and let HA start over. But it’s good to have a working example of how it can be done. Admittedly, I don’t rely on the history data in HA very much. I store the data I really care about in other ways. I never liked the idea of putting historical and real-time data in the same database.

I’ve been toying with the idea of cleaning up the ZHA zigbee.db in a similar way. Just looking at the structure makes me suspect there’s a lot there which needs attention. But with only one working example, on a production machine, I’m reluctant to start hacking it.

molnart · August 10, 2024, 7:13pm

I think I am in a similar situation. My HA installation is over 5 years old and I recently started to get database corruption issues. My db is ~1200 MB big. I had several db corruption issues in the last weeks and I have actually moved HA to a different machine as I was suspecting some SSD failure issues.

Did try to perform your steps, rebuilding the states and events tables and the indexes (mine took only a few seconds, but did it on my desktop), however I am still getting database malformed and connection could not be estabilished errors…


  File "/usr/src/homeassistant/homeassistant/components/history/__init__.py", line 121, in get
    await get_instance(hass).async_add_executor_job(
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/history/__init__.py", line 146, in _sorted_significant_states_json
    with session_scope(hass=hass, read_only=True) as session:
  File "/usr/local/lib/python3.12/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/recorder.py", line 94, in session_scope
    session = get_instance(hass).get_session()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 276, in get_session
    raise RuntimeError("The database connection has not been established")
RuntimeError: The database connection has not been established

EDIT: recreating the statistics table in the same way and removing the _old tables solved the problem. at least for now. Thanks for your post, it was super useful !!