Postgresql to SQLite migration, moving back to SQLite

I’ve been running my Home Assistant database on PostgreSQL for a while due to issues with the Recorder’s SQLite database. However, since the Recorder has seen improvements (like long-term statistics), I decided to migrate back to SQLite. Additionally, my PostgreSQL ran on a separate server which added complexity, and I’d like the option to turn off that server occasionally without impacting my Home Assistant setup.

I couldn’t find a straightforward guide on how to migrate data back to SQLite, so I decided to try my luck and figure it out myself. I’m sharing my process here to help others who might be in a similar situation. This is a rough guide, but it should provide a good starting point!

Steps I Followed:

  1. Clean up data in PostgreSQL:
    • Reduce the amount of data to migrate. Less data means a faster migration.
    • This step is the only one that actually deletes anything. So make backups as needed.
  2. Export data from PostgreSQL to CSV:
    • Ensure the column order matches the SQLite schema and apply necessary transformations (like datetime and boolean conversions) during export.
  3. Prepare a fresh home-assistant_v2.db database:
    • Start with a clean slate for importing the data.
  4. Import the CSV files into SQLite:
    • Use the SQLite command-line tool to import each CSV file into the corresponding table.
  5. Fix null fields:
    • CSV files don’t natively handle null values, so empty strings need to be replaced with nulls.
  6. Finish up and enjoy:
    • Verify everything is working correctly with your newly migrated SQLite database.

Detailed Process

My PostgreSQL database was quite large—around 80 GB. I had set my Recorder retention to 365 days, which retained a lot of data. To reduce this, I started by cleaning up the states table, setting the Recorder to purge data older than 10 days. Despite several days of waiting (and invocations of the purge service), I still had hundreds of millions of rows, so I needed another approach.

Eventually, I decided to rename my states table to states_old and create a new states table with the same structure. Here’s why:

  • Figure out what to delete: Depending on your data needs, decide how much you can safely delete.
  • Renaming the table: Since I needed to keep about 12% of my data, it was faster to insert the needed rows into a new table than to delete rows from the old one.
  • Protip, adjusting sequence numbers: If, like me, you get sidetracked and need to run on the Postgres again for a while. Make sure to set the sequence number for new rows higher than your current max state_id. This ensures Home Assistant continues functioning without conflicts. Search the internet for how to do this.

My Script

Below is the script I used. It’s more of a series of notes than a script you can run directly. You can copy-paste these commands to handle the export from PostgreSQL to CSV, making sure to apply the necessary data transformations. Note that the column layout should match your SQLite database.

After exporting, get a fresh home-assistant_v2.db from Home Assistant. I did this by reconfiguring my Recorder to use the built-in database, restarting Home Assistant, and waiting for it to create a new database file. Then, I stopped Home Assistant again to perform the import.

Export Data from PostgreSQL

Adjust credentials and other details as necessary

# Set PostgreSQL credentials and server details
PGHOST="postgres.home"
PGDATABASE="homeassistant"
PGUSER="homeassistant"
PGPASSWORD="homeassistant"

# Export data to CSV files with column ordering, date formatting, and boolean conversion
psql -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT data_id, hash, shared_data FROM event_data) TO 'event_data.csv' CSV HEADER"
psql -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT event_type_id, event_type FROM event_types) TO 'event_types.csv' CSV HEADER"
psql -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT attributes_id, hash, shared_attrs FROM state_attributes) TO 'state_attributes.csv' CSV HEADER"
psql -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT metadata_id, entity_id FROM states_meta) TO 'states_meta.csv' CSV HEADER"
psql -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT id, statistic_id, source, unit_of_measurement, CASE WHEN has_mean THEN 1 ELSE 0 END as has_mean, CASE WHEN has_sum THEN 1 ELSE 0 END as has_sum, name FROM statistics_meta) TO 'statistics_meta.csv' CSV HEADER"
psql -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT run_id, (start AT TIME ZONE 'UTC')::timestamp as start, (\"end\" AT TIME ZONE 'UTC')::timestamp as \"end\", CASE WHEN closed_incorrect THEN 1 ELSE 0 END as closed_incorrect, (created AT TIME ZONE 'UTC')::timestamp as created FROM recorder_runs) TO 'recorder_runs.csv' CSV HEADER"
psql -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT migration_id, version FROM migration_changes) TO 'migration_changes.csv' CSV HEADER"
psql -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT change_id, schema_version, (changed AT TIME ZONE 'UTC')::timestamp as changed FROM schema_changes) TO 'schema_changes.csv' CSV HEADER"
psql -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT run_id, (start AT TIME ZONE 'UTC')::timestamp as start FROM statistics_runs) TO 'statistics_runs.csv' CSV HEADER"
psql -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT id, (created AT TIME ZONE 'UTC')::timestamp as created, created_ts, metadata_id, (start AT TIME ZONE 'UTC')::timestamp as start, start_ts, mean, min, max, (last_reset AT TIME ZONE 'UTC')::timestamp as last_reset, last_reset_ts, state, sum FROM statistics) TO 'statistics.csv' CSV HEADER"
psql -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT id, (created AT TIME ZONE 'UTC')::timestamp as created, created_ts, metadata_id, (start AT TIME ZONE 'UTC')::timestamp as start, start_ts, mean, min, max, (last_reset AT TIME ZONE 'UTC')::timestamp as last_reset, last_reset_ts, state, sum FROM statistics_short_term) TO 'statistics_short_term.csv' CSV HEADER"
psql -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT event_id, event_type, event_data, origin, origin_idx, (time_fired AT TIME ZONE 'UTC')::timestamp as time_fired, time_fired_ts, context_id, context_user_id, context_parent_id, data_id, context_id_bin, context_user_id_bin, context_parent_id_bin, event_type_id FROM events) TO 'events.csv' CSV HEADER"
psql -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT state_id, entity_id, state, attributes, event_id, (last_changed AT TIME ZONE 'UTC')::timestamp as last_changed, last_changed_ts, last_reported_ts, (last_updated AT TIME ZONE 'UTC')::timestamp as last_updated, last_updated_ts, old_state_id, attributes_id, context_id, context_user_id, context_parent_id, origin_idx, context_id_bin, context_user_id_bin, context_parent_id_bin, metadata_id FROM states) TO 'states.csv' CSV HEADER"

Prepare and Import into SQLite

  1. Acquire the home-assistant_v2.db: Get a fresh database file by reconfiguring Home Assistant to use SQLite, restarting it, and then stopping it once the new database is created.

  2. Open the SQLite database using the SQLite CLI tool:

    sqlite3 home-assistant_v2.db
    
  3. Clean out existing data that Home Assistant may have saved before shutting down:

    DELETE FROM main.event_data;
    DELETE FROM main.event_types;
    DELETE FROM main.events;
    DELETE FROM main.statistics_meta;
    DELETE FROM main.statistics_runs;
    DELETE FROM main.statistics;
    DELETE FROM main.states;
    DELETE FROM main.states_meta;
    DELETE FROM main.statistics_short_term;
    DELETE FROM main.state_attributes;
    DELETE FROM main.recorder_runs;
    DELETE FROM main.migration_changes;
    
  4. Import data from the CSV files exported from PostgreSQL:

    .mode csv
    .import event_data.csv event_data
    .import event_types.csv event_types
    .import state_attributes.csv state_attributes
    .import states_meta.csv states_meta
    .import statistics_meta.csv statistics_meta
    .import recorder_runs.csv recorder_runs
    .import migration_changes.csv migration_changes
    .import statistics_runs.csv statistics_runs
    .import statistics_short_term.csv statistics_short_term
    .import statistics.csv statistics
    .import events.csv events
    .import states.csv states
    
  5. Replace empty strings with NULL for all nullable fields:

   ## Replace empty strings with NULL in 'event_data'
   UPDATE event_data SET 
    hash = NULLIF(hash, ''),
    shared_data = NULLIF(shared_data, '');

   ## Replace empty strings with NULL in 'event_types'
   UPDATE event_types SET 
    event_type = NULLIF(event_type, '');

   ## Replace empty strings with NULL in 'state_attributes'
   UPDATE state_attributes SET 
    hash = NULLIF(hash, ''),
    shared_attrs = NULLIF(shared_attrs, '');

   ## Replace empty strings with NULL in 'states_meta'
   UPDATE states_meta SET 
    entity_id = NULLIF(entity_id, '');

   ## Replace empty strings with NULL in 'statistics_meta'
   UPDATE statistics_meta SET 
    statistic_id = NULLIF(statistic_id, ''),
    source = NULLIF(source, ''),
    unit_of_measurement = NULLIF(unit_of_measurement, ''),
    name = NULLIF(name, '');

   ## Replace empty strings with NULL in 'recorder_runs'
   UPDATE recorder_runs SET 
    start = NULLIF(start, ''),
    end = NULLIF(end, '');

   ## Replace empty strings with NULL in 'migration_changes'
   ## No nullable fields

   ## Replace empty strings with NULL in 'schema_changes'
   UPDATE schema_changes SET 
    schema_version = NULLIF(schema_version, '');

   ## Replace empty strings with NULL in 'statistics_runs'
   ## No nullable fields

   ## Replace empty strings with NULL in 'events'
   UPDATE events SET 
    event_type = NULLIF(event_type, ''),
    event_data = NULLIF(event_data, ''),
    origin = NULLIF(origin, ''),
    context_id = NULLIF(context_id, ''),
    context_user_id = NULLIF(context_user_id, ''),
    context_parent_id = NULLIF(context_parent_id, ''),
    context_id_bin = NULLIF(context_id_bin, ''),
    context_user_id_bin = NULLIF(context_user_id_bin, ''),
    context_parent_id_bin = NULLIF(context_parent_id_bin, '');

   ## Replace empty strings with NULL in 'statistics'
   UPDATE statistics SET 
    created = NULLIF(created, ''),
    start = NULLIF(start, ''),
    mean = NULLIF(mean, ''),
    min = NULLIF(min, ''),
    max = NULLIF(max, ''),
    last_reset = NULLIF(last_reset, ''),
    state = NULLIF(state, ''),
    sum = NULLIF(sum, '');

   ## Replace empty strings with NULL in 'statistics_short_term'
   UPDATE statistics_short_term SET 
    created = NULLIF(created, ''),
    start = NULLIF(start, ''),
    mean = NULLIF(mean, ''),
    min = NULLIF(min, ''),
    max = NULLIF(max, ''),
    last_reset = NULLIF(last_reset, ''),
    state = NULLIF(state, ''),
    sum = NULLIF(sum, '');

   ## Replace empty strings with NULL in 'states'
   UPDATE states SET 
    entity_id = NULLIF(entity_id, ''),
    state = NULLIF(state, ''),
    attributes = NULLIF(attributes, ''),
    last_changed = NULLIF(last_changed, ''),
    last_updated = NULLIF(last_updated, ''),
    context_id = NULLIF(context_id, ''),
    context_user_id = NULLIF(context_user_id, ''),
    context_parent_id = NULLIF(context_parent_id, ''),
    context_id_bin = NULLIF(context_id_bin, ''),
    context_user_id_bin = NULLIF(context_user_id_bin, ''),
    context_parent_id_bin = NULLIF(context_parent_id_bin, '');

Conclusion

By following these steps, you should be able to successfully migrate your Home Assistant database from PostgreSQL back to SQLite. The process involves careful handling of data types and null values, but it’s definitely doable. Good luck!

Related Posts:

4 Likes

Hi Michael Bisbjerg,

This looks great. Added it to the cookbook for you so we can easy reference it. I look in the Cookbook first for links when helping others.

Also the Maria post is listed in the Cookbook that is the Community Guide Compilation of the other thread you referenced.

This migration worked great for me. Thank you for sharing. Just a few tips for others that run through it.

  1. I had to remove the headers on each CSV to do the import.
  2. If you are getting read only errors on the sqllite DB, make sure you are opening it as the correct user permissions from the OS side. I use a container so I had to run the import and commands with privilege.
  3. Run one command at a time so you can assure its working as expected.

Migration successful. No more Postgres DB. Thanks!

1 Like

Thanks for this guide! I put together a shell script that combines all the above snippets, plus the following changes:

  • Included schema_changes in the import
  • Added a few more nullable fields, specifically ones ending in _ts
  • Adjusted the psql dump commands so they work for containerized postgres
#!/bin/sh

# Create a temporary folder for the CSV files
mkdir -p ha_tmp

# Set PostgreSQL credentials and server details
PGHOST="postgres.home"
PGDATABASE="homeassistant"
PGUSER="homeassistant"
PGPASSWORD="homeassistant"
PSQL_CMD="psql" # See below for containerized examples
# PSQL_CMD="docker exec -it postgres-container psql"
# PSQL_CMD="podman exec -it postgres-container psql"

# Export data to CSVs
$PSQL_CMD -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT data_id, hash, shared_data FROM event_data) TO STDOUT CSV" > ha_tmp/event_data.csv
$PSQL_CMD -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT event_type_id, event_type FROM event_types) TO STDOUT CSV" > ha_tmp/event_types.csv
$PSQL_CMD -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT attributes_id, hash, shared_attrs FROM state_attributes) TO STDOUT CSV" > ha_tmp/state_attributes.csv
$PSQL_CMD -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT metadata_id, entity_id FROM states_meta) TO STDOUT CSV" > ha_tmp/states_meta.csv
$PSQL_CMD -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT id, statistic_id, source, unit_of_measurement, CASE WHEN has_mean THEN 1 ELSE 0 END as has_mean, CASE WHEN has_sum THEN 1 ELSE 0 END as has_sum, name FROM statistics_meta) TO STDOUT CSV" > ha_tmp/statistics_meta.csv
$PSQL_CMD -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT run_id, (start AT TIME ZONE 'UTC')::timestamp as start, (\"end\" AT TIME ZONE 'UTC')::timestamp as \"end\", CASE WHEN closed_incorrect THEN 1 ELSE 0 END as closed_incorrect, (created AT TIME ZONE 'UTC')::timestamp as created FROM recorder_runs) TO STDOUT CSV" > ha_tmp/recorder_runs.csv
$PSQL_CMD -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT migration_id, version FROM migration_changes) TO STDOUT CSV" > ha_tmp/migration_changes.csv
$PSQL_CMD -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT change_id, schema_version, (changed AT TIME ZONE 'UTC')::timestamp as changed FROM schema_changes) TO STDOUT CSV" > ha_tmp/schema_changes.csv
$PSQL_CMD -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT run_id, (start AT TIME ZONE 'UTC')::timestamp as start FROM statistics_runs) TO STDOUT CSV" > ha_tmp/statistics_runs.csv
$PSQL_CMD -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT id, (created AT TIME ZONE 'UTC')::timestamp as created, created_ts, metadata_id, (start AT TIME ZONE 'UTC')::timestamp as start, start_ts, mean, min, max, (last_reset AT TIME ZONE 'UTC')::timestamp as last_reset, last_reset_ts, state, sum FROM statistics) TO STDOUT CSV" > ha_tmp/statistics.csv
$PSQL_CMD -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT id, (created AT TIME ZONE 'UTC')::timestamp as created, created_ts, metadata_id, (start AT TIME ZONE 'UTC')::timestamp as start, start_ts, mean, min, max, (last_reset AT TIME ZONE 'UTC')::timestamp as last_reset, last_reset_ts, state, sum FROM statistics_short_term) TO STDOUT CSV" > ha_tmp/statistics_short_term.csv
$PSQL_CMD -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT event_id, event_type, event_data, origin, origin_idx, (time_fired AT TIME ZONE 'UTC')::timestamp as time_fired, time_fired_ts, context_id, context_user_id, context_parent_id, data_id, context_id_bin, context_user_id_bin, context_parent_id_bin, event_type_id FROM events) TO STDOUT CSV" > ha_tmp/events.csv
$PSQL_CMD -h $PGHOST -U $PGUSER -d $PGDATABASE -c "\COPY (SELECT state_id, entity_id, state, attributes, event_id, (last_changed AT TIME ZONE 'UTC')::timestamp as last_changed, last_changed_ts, last_reported_ts, (last_updated AT TIME ZONE 'UTC')::timestamp as last_updated, last_updated_ts, old_state_id, attributes_id, context_id, context_user_id, context_parent_id, origin_idx, context_id_bin, context_user_id_bin, context_parent_id_bin, metadata_id FROM states) TO STDOUT CSV" > ha_tmp/states.csv

# Back up our template sqlite database in case of problems
cp home-assistant_v2.db home-assistant_v2.db.orig

sqlite3 home-assistant_v2.db <<'END_SQL'
# Clear out any existing data
DELETE FROM main.event_data;
DELETE FROM main.event_types;
DELETE FROM main.events;
DELETE FROM main.statistics_meta;
DELETE FROM main.statistics_runs;
DELETE FROM main.statistics;
DELETE FROM main.states;
DELETE FROM main.states_meta;
DELETE FROM main.statistics_short_term;
DELETE FROM main.state_attributes;
DELETE FROM main.recorder_runs;
DELETE FROM main.migration_changes;
DELETE FROM main.schema_changes;

# Import CSVs
.mode csv
.import ha_tmp/event_data.csv event_data
.import ha_tmp/event_types.csv event_types
.import ha_tmp/state_attributes.csv state_attributes
.import ha_tmp/states_meta.csv states_meta
.import ha_tmp/statistics_meta.csv statistics_meta
.import ha_tmp/recorder_runs.csv recorder_runs
.import ha_tmp/migration_changes.csv migration_changes
.import ha_tmp/schema_changes.csv schema_changes
.import ha_tmp/statistics_runs.csv statistics_runs
.import ha_tmp/statistics_short_term.csv statistics_short_term
.import ha_tmp/statistics.csv statistics
.import ha_tmp/events.csv events
.import ha_tmp/states.csv states

# Replace empty strings with NULL in 'event_data'
UPDATE event_data SET
 hash = NULLIF(hash, ''),
 shared_data = NULLIF(shared_data, '');

# Replace empty strings with NULL in 'event_types'
UPDATE event_types SET
 event_type = NULLIF(event_type, '');

# Replace empty strings with NULL in 'state_attributes'
UPDATE state_attributes SET
 hash = NULLIF(hash, ''),
 shared_attrs = NULLIF(shared_attrs, '');

# Replace empty strings with NULL in 'states_meta'
UPDATE states_meta SET
 entity_id = NULLIF(entity_id, '');

# Replace empty strings with NULL in 'statistics_meta'
UPDATE statistics_meta SET
 statistic_id = NULLIF(statistic_id, ''),
 source = NULLIF(source, ''),
 unit_of_measurement = NULLIF(unit_of_measurement, ''),
 name = NULLIF(name, '');

# Replace empty strings with NULL in 'recorder_runs'
UPDATE recorder_runs SET
 start = NULLIF(start, ''),
 end = NULLIF(end, '');

# Replace empty strings with NULL in 'migration_changes'
# No nullable fields

# Replace empty strings with NULL in 'schema_changes'
UPDATE schema_changes SET
 schema_version = NULLIF(schema_version, '');

# Replace empty strings with NULL in 'statistics_runs'
# No nullable fields

# Replace empty strings with NULL in 'events'
UPDATE events SET
 event_type = NULLIF(event_type, ''),
 event_data = NULLIF(event_data, ''),
 origin = NULLIF(origin, ''),
 context_id = NULLIF(context_id, ''),
 context_user_id = NULLIF(context_user_id, ''),
 context_parent_id = NULLIF(context_parent_id, ''),
 context_id_bin = NULLIF(context_id_bin, ''),
 context_user_id_bin = NULLIF(context_user_id_bin, ''),
 context_parent_id_bin = NULLIF(context_parent_id_bin, '');

# Replace empty strings with NULL in 'statistics'
UPDATE statistics SET
 created = NULLIF(created, ''),
 start = NULLIF(start, ''),
 mean = NULLIF(mean, ''),
 min = NULLIF(min, ''),
 max = NULLIF(max, ''),
 last_reset = NULLIF(last_reset, ''),
 last_reset_ts = NULLIF(last_reset_ts, ''),
 state = NULLIF(state, ''),
 sum = NULLIF(sum, '');

# Replace empty strings with NULL in 'statistics_short_term'
UPDATE statistics_short_term SET
 created = NULLIF(created, ''),
 start = NULLIF(start, ''),
 mean = NULLIF(mean, ''),
 min = NULLIF(min, ''),
 max = NULLIF(max, ''),
 last_reset = NULLIF(last_reset, ''),
 last_reset_ts = NULLIF(last_reset_ts, ''),
 state = NULLIF(state, ''),
 sum = NULLIF(sum, '');

# Replace empty strings with NULL in 'states'
UPDATE states SET 
 entity_id = NULLIF(entity_id, ''),
 state = NULLIF(state, ''),
 attributes = NULLIF(attributes, ''),
 last_changed = NULLIF(last_changed, ''),
 last_changed_ts = NULLIF(last_changed_ts, ''),
 last_reported_ts = NULLIF(last_reported_ts, ''),
 last_updated = NULLIF(last_updated, ''),
 context_id = NULLIF(context_id, ''),
 context_user_id = NULLIF(context_user_id, ''),
 context_parent_id = NULLIF(context_parent_id, ''),
 context_id_bin = NULLIF(context_id_bin, ''),
 context_user_id_bin = NULLIF(context_user_id_bin, ''),
 context_parent_id_bin = NULLIF(context_parent_id_bin, '');
END_SQL
1 Like

Great!.. Glad it helped.

I didn’t note the headers on each file, as sqlite ignored them for me. It’s evident if you run the same import twice, where it only complains on the headers line in the first, but on the second run it complains about the headers and that all rows are duplicate.

I opted specifically not to include the schema changes table. I was unsure if the schema migrations were different between postgres and sqlite, so I opted to let HA create the sqlite database, and then I removed all data except the schema migrations.

Then when importing, leave the schema as is.

This ensures that we don’t suddenly see HA make a migration on data it shouldn’t.

I looked through the code base and as far as I can tell only the last schema_changes entry is ever used, so it should be safe. I guess I’ll find out the next time the schema gets updated :stuck_out_tongue: