Migrate to PostgreSQL

I’ve followed the instructions to migrate to SQLAlchemy. SQLite is fine, but I’m looking to move to PostgreSQL. Is there a mechanism to migrate data between SQLAlchemy backends?

Perhaps this helps: Converting from other Databases to PostgreSQL

There are to tables holding state changes, events and states.
The sensor values are in states.state contains sensor value and events.time_fired is the timestamp but what are actually the other timestamps in states-table?

                                       Table "public.events"
   Column   |           Type           |                         Modifiers
------------+--------------------------+-----------------------------------------------------------
 event_id   | integer                  | not null default nextval('events_event_id_seq'::regclass)
 event_type | character varying(32)    |
 event_data | text                     |
 origin     | character varying(32)    |
 time_fired | timestamp with time zone |
 created    | timestamp with time zone |
Indexes:
    "events_pkey" PRIMARY KEY, btree (event_id)
    "ix_events_event_type" btree (event_type)
Referenced by:
    TABLE "states" CONSTRAINT "states_event_id_fkey" FOREIGN KEY (event_id) REFERENCES events(event_id)

                                        Table "public.states"
    Column    |           Type           |                         Modifiers
--------------+--------------------------+-----------------------------------------------------------
 state_id     | integer                  | not null default nextval('states_state_id_seq'::regclass)
 domain       | character varying(64)    |
 entity_id    | character varying(255)   |
 state        | character varying(255)   |
 attributes   | text                     |
 event_id     | integer                  |
 last_changed | timestamp with time zone |
 last_updated | timestamp with time zone |
 created      | timestamp with time zone |
Indexes:
    "states_pkey" PRIMARY KEY, btree (state_id)
    "states__significant_changes" btree (domain, last_updated, entity_id)
    "states__state_changes" btree (last_changed, last_updated, entity_id)
Foreign-key constraints:
    "states_event_id_fkey" FOREIGN KEY (event_id) REFERENCES events(event_id)

states has timestamp columns last_changed, last_updated, created. I guess created is only technical value?
Created-column also show values that are -3 hours compared to others, so maybe there’s something wrong with it’s time zone code?

In history-component there’s code that uses both last_updated and last_changed.

    """Return states changes during UTC period start_time - end_time."""
    states = recorder.get_model('States')
    query = recorder.query('States').filter(
        (states.last_changed == states.last_updated) &
        (states.last_changed > start_time))

    if end_time is not None:
        query = query.filter(states.last_updated < end_time)

So actually the query is:

            states.last_changed == states.last_updated and
            states.last_changed > start_time and
            states.last_updated < end_time

Because last_changed = last_updated, the query can also written in a more easy way?

            states.last_changed == states.last_updated and
            states.last_changed > start_time and
            states.last_changed < end_time

Now it’s a little more intuitive?

I tried creation of partitions (daily or monthly) in Postgres but there wasn’t any gains, actually the opposite. I tested with 4000 rows/day, data for year = 1,5M rows. If you are purging old records, then it would make more sense to just drop old partitions.

Well, although this is a really, really old thread, maybe the following may be interesting for others with this problem. I just recently also migrated from the default sqlite database to postgresql. My procedure was as follows:

  1. create a postgresql database to be used by home assistant and make sure that the homeassistant user can access it and create tables in it
  2. write a config for pgloader:
load database
  from sqlite:///full/path/to/home-assistant_v2.db
  into postgresql:///home
with data only, drop indexes, reset sequences, truncate;
  1. reconfigure the recorder integration so it would be using postgresql. As I was already into this, I excluded some entities and events I’m not interested in.
  2. stop home assistant
  3. start home assistant, let it run for 20 seconds or so in order to let it create the necessary database tables and stop it again.
  4. run ‘pgloader’ with the configuration mentioned above (write it to a file, e.g. migration.config), then run pgloader migration.config This config deletes all entries HA might have written in the 20 seconds it was running before it copies the contents of the old sqlite database into the new, shiny postgresql database.
  5. restart home assistant and be happy

pgloader is a great tool for such conversions since it is fast and highly configurable. Nonetheless I advise against using it for also creating the database tables since it wouldn’t choose the same column types in postgresql home assistent does: the attributes column in postgresql will be of type text whereas in sqlite it was varchar.

6 Likes

I had to add

set timezone to ‘UTC’

to the pgloader config to get correct times.

Thank you @Jennes ! This was very helpful!

Don’t know is valid topic for question. I’ve already on Postgres. Want to know about use Postgres partitions for history tables. I want store all history (no purges). I disable auto purge in HA. But some SQL requests still start to slow down after a year of history. So question is simple:

  • Is it difficult to migrate to the partition model for history? (I experienced developer)
  • How migrations will behave after home-assistant updates?
  • Whether anyone else had a similar experience, I did not find (fast) any mentions this topic (maybe I badly searched)?
1 Like

I’m not using partitions, so I am no help in this vein. That said, I have been running postgresql as my core HA history and as a analytic db for going on two year now with very good success. I run HA and postgresql in docker containers on a 16 gb intel i7 device, a number of other docker containers running as well, I appear to be hardly taxing the box. DB size is 600 GB. I am running 2021.6.6 HA, so cannot speak to some of latest db changes.

I use simple triggers to make a copy of each new state and event record that HA adds, I let HA prune it’s dbs at 65 days and currently have over two years of data in my analytic dbs. In the post linked below is the setup and stored procedures I use to do this:

https://community.home-assistant.io/t/home-assistant-db-vs-mariadb/224295/4

There a several folks using the timescale layer on top of postgresql to give better time series abilities, I have yet to install this. if you search for either or both of ‘postgresql’ or ‘timescale’ you will get some good info.

Good hunting, I have very pleased with my use of postgresql with Home Assistant. I use a juypter container to do mosts of my analysis on postgresql.

1 Like

Thanks for answer! Really appreciate it. Will continue my research.

How does resetting the sequences work for you? I get heaps of unique constraint issues afterward.
Wouldn’t it make more sense to keep them?

I found this one through google: Migrate Home Assistant from SQLite to PostgreSQL | Sigfried Seldeslachts

Will go through it and report on my success … one of those days.

An old question, I know…

I have only been running home assistant for a few weeks so far, so my home assistant database have not grown much yet, but I am responsible for a postgreql state of health database at work that has been running for a few years and now have some gigs of data. There I installed the timescaledb (https://www.timescale.com/) extension and using the data compression it offers speeded things up quite a bit.

I have now gone through the blog post at Migrate Home Assistant from SQLite to PostgreSQL | Sigfried Seldeslachts and it … seems to work. I have the following comments to that guide:

  • Read through the guide a couple of times before actually performing the steps - at least if you want to minimize the hass downtime and data loss. For one thing, the pgloader took quite long time to install on archlinux.
  • The pgloader didn’t like the DB connector strings to be quoted, I had to remove the single quotes there.
  • The recorder integration did not want to start up after migrating the database (oups). Some problems with the primary key sequences it seems like. That’s a real bummer. I’m doing research now to see if I can find a workaround.

I’m using plain postgresql, but I should probably upgrade to timescaledb since it’s more optimal for this kind of data …

Ok, here is a workaround:

sudo -u postgres pg_dump  homeassistant_db | grep SELECT | grep _seq | perl -pe 's/, \d+, true/, 12345678, true/' | sudo -u postgres psql homeassistant_db

I’ve used Siegried’s guide to try to migrate from SQLite to PostgreSQL. I’m encountering an error that I have not seen mentioned anywhere else.

My SQLite database contains a “domain” field in the “states” table. This field does not get created by HA when I start HA with the recorder connected to PostgreSQL. And since the field is not created by HA the migration of the SQLite data fails.

I’m starting to wonder if this “domain” field has been abandoned in a recent update (haven’t been able to find any info on it) and if it should perhaps be dropped just like the events.created field that Siegfried already mentions in his post. I say recent since people seem to have used the script in March without much trouble, but the age of the install will of course matter as well. My install was created early-mid 2021, can’t remember exactly.

If anyone has a suggestion on a fix, or a path that can be explored I’d appreciate it.

I’m pretty sure it was.
I guess that the field was not dropped (I compared a “grand-fathered” db vs. a recent one, and the field is there in the old but not in the new) because it was not possible in sqlite up until recently.

Not really sure how to proceed.
To avoid headaches, I’d personally just add the field to PostgresQL as a quick win :wink:

I’ll need to take a deeper dive into this if I’m going to get it to work. Without the “domains” (and “created” btw) fields the migration runs, but it has never managed to run completely through correctly. It stopped with a “Killed” message last time, restarting the migration seems to get it to complete but I have no way of knowing if the migration has actually completed correctly.

I’d did test HA towards the completed migration and got the following errors:
Logger: homeassistant.components.recorder.auto_repairs.schema
Source: components/recorder/auto_repairs/schema.py:161
Integration: Recorder (documentation, issues)
First occurred: 10:36:35 (8 occurrences)
Last logged: 10:36:35

Column min in database table statistics does not support double precision (stored=None != expected=1.000000000000001)
Column max in database table statistics does not support double precision (stored=None != expected=1.000000000000001)
Column last_reset_ts in database table statistics does not support double precision (stored=0.0 != expected=1.000000000000001)
Column state in database table statistics does not support double precision (stored=4353.355 != expected=1.000000000000001)
Column sum in database table statistics does not support double precision (stored=0.0 != expected=1.000000000000001)

I currently don’t know if those errors are a result of the process being killed or if it’s a migration error. Now that I think about it those columns should be created by HA itself, not the migration.

This migration has turned out to be far more complicated than I expected. I’m not a DBA, but I’ve worked with databases, so I thought this should be manageable given the instructions available. I’ll try to post an update if I get it to work.

@FlameBoy Did you ever manage to complete the migration?

I followed the guides, but encountered a bunch of type errors:

2023-10-19T13:54:34.245002Z WARNING Source column "public"."statistics_meta"."id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."statistics_meta"."id".
2023-10-19T13:54:34.245002Z WARNING Source column "public"."statistics_meta"."statistic_id" is casted to type "text" which is not the same as "character varying", the type of current target database column "public"."statistics_meta"."statistic_id".
2023-10-19T13:54:34.245002Z WARNING Source column "public"."statistics_meta"."source" is casted to type "text" which is not the same as "character varying", the type of current target database column "public"."statistics_meta"."source".
2023-10-19T13:54:34.245002Z WARNING Source column "public"."statistics_meta"."unit_of_measurement" is casted to type "text" which is not the same as "character varying", the type of current target database column "public"."statistics_meta"."unit_of_measurement".
2023-10-19T13:54:34.245002Z WARNING Source column "public"."statistics_meta"."name" is casted to type "text" which is not the same as "character varying", the type of current target database column "public"."statistics_meta"."name".
2023-10-19T13:54:34.245002Z WARNING Source column "public"."recorder_runs"."run_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."recorder_runs"."run_id".
2023-10-19T13:54:34.245002Z WARNING Source column "public"."schema_changes"."change_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."schema_changes"."change_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."schema_changes"."schema_version" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."schema_changes"."schema_version".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."statistics"."id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."statistics"."id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."statistics"."metadata_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."statistics"."metadata_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."statistics_runs"."run_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."statistics_runs"."run_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."statistics_short_term"."id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."statistics_short_term"."id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."statistics_short_term"."metadata_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."statistics_short_term"."metadata_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."state_attributes"."attributes_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."state_attributes"."attributes_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."event_data"."data_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."event_data"."data_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."event_types"."event_type_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."event_types"."event_type_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."event_types"."event_type" is casted to type "text" which is not the same as "character varying", the type of current target database column "public"."event_types"."event_type".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."states_meta"."metadata_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."states_meta"."metadata_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."states_meta"."entity_id" is casted to type "text" which is not the same as "character varying", the type of current target database column "public"."states_meta"."entity_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."events"."event_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."events"."event_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."events"."event_type" is casted to type "text" which is not the same as "character", the type of current target database column "public"."events"."event_type".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."events"."event_data" is casted to type "text" which is not the same as "character", the type of current target database column "public"."events"."event_data".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."events"."origin" is casted to type "text" which is not the same as "character", the type of current target database column "public"."events"."origin".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."events"."context_id" is casted to type "text" which is not the same as "character", the type of current target database column "public"."events"."context_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."events"."context_user_id" is casted to type "text" which is not the same as "character", the type of current target database column "public"."events"."context_user_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."events"."context_parent_id" is casted to type "text" which is not the same as "character", the type of current target database column "public"."events"."context_parent_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."events"."data_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."events"."data_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."events"."origin_idx" is casted to type "bigint" which is not the same as "smallint", the type of current target database column "public"."events"."origin_idx".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."events"."event_type_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."events"."event_type_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."state_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."states"."state_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."entity_id" is casted to type "text" which is not the same as "character", the type of current target database column "public"."states"."entity_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."state" is casted to type "text" which is not the same as "character varying", the type of current target database column "public"."states"."state".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."attributes" is casted to type "text" which is not the same as "character", the type of current target database column "public"."states"."attributes".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."event_id" is casted to type "bigint" which is not the same as "smallint", the type of current target database column "public"."states"."event_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."old_state_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."states"."old_state_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."attributes_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."states"."attributes_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."origin_idx" is casted to type "bigint" which is not the same as "smallint", the type of current target database column "public"."states"."origin_idx".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."context_id" is casted to type "text" which is not the same as "character", the type of current target database column "public"."states"."context_id".
2023-10-19T13:54:34.248002Z WARNING Source column "public"."states"."context_user_id" is casted to type "text" which is not the same as "character", the type of current target database column "public"."states"."context_user_id".
2023-10-19T13:54:34.248002Z WARNING Source column "public"."states"."context_parent_id" is casted to type "text" which is not the same as "character", the type of current target database column "public"."states"."context_parent_id".
2023-10-19T13:54:34.248002Z WARNING Source column "public"."states"."metadata_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."states"."metadata_id".

I allowed HA to create the scheme in Postgres so that should be correct, and I used the same pgloader script everyone else seems to have working, so not sure what’s going wrong in my case. I dropped the fields that aren’t in the new schema, but I doubt that’s the cause of these issues.

I’m stumped…

Posting to let people know that this technique still works, at least in 2024.5.5 (yeah already outdated somewhat, but as I’m not using cloud integrations that are likely to break, I prefer to use the stable HA package of my distro).

Saving you some reading from above, following are the big steps are. Adapt to your setup, notably the database name (hass for me).

  1. Back your data up.
  2. Set up your PostgreSQL database, and configure HA to connect to it
  3. Restart HA on this configuration, you can stop it as soon as it created the tables in the database (with psql open on your database, run \dt until it stabilizes to a dozen rows)
  4. Run pgloader myconfig.conf. Here’s what my configuration file looks like:
load database
  from sqlite:///var/lib/hass/home-assistant_v2.db
  into postgresql:///hass
with data only, drop indexes, reset sequences, truncate, batch rows = 1000, prefetch rows = 1000
SET work_mem to '32 MB', maintenance_work_mem to '64 MB';
Explaination
  • data only and truncate so we use the structure Home Assistant created (allegedly otherwise it will fail, didn’t verify)
  • drop indexes so they get recreated (not exactly sure why, but it’s harmless and might prevent a few edge cases)
  • reset sequnces see below
  • batch rows, prefetch rows, *work_mem so the operation performs well enough it doesn’t run out of heap or something.
  • No set timezone to ‘UTC’ as Imroy suggested It wasn’t necessary in my case, but maybe it would be in yours?

You will get a few warnings but those are supposedly expected.
It will also take a while, the data import goes fast but not the recreation of the indexes.

  1. Fix the sequence ids
pg_dump hass --schema-only | grep 'ADD GENERATED BY DEFAULT AS IDENTITY' | awk '{ print "SELECT setval('\''" $3 "_" $6 "_seq'\'', (SELECT MAX("$6") FROM "$3"));" }' | psql hass
Explaination

For most tables there’s a sequence counter which determines the next id a row in that table will get. We reset it with reset sequences but even without it pgloader seems to set it to a too low value:

hass=# SELECT MAX(state_id) FROM states;
    max
-----------
 132378632
hass=# SELECT last_value FROM states_state_id_seq;
 last_value
------------
       2711

This causes issues like complaining that columns do not support double precision (they do, but due to duplicate id HA will not get back the row it attempted to write to test this), but also the data written to recorder will not show up (despite being written. I have not found a way to salvage data written at the wrong place, duplicates ids and foreign keys not written as such makes this hard) for a long time (until you wrote as much data as you did before the migration).

The awk magic above resets the sequence counter to a value that makes sense.
This is an alternative to the solution proposed by tobixen, which I think is cleaner.

  1. Start Home Assistant, make sure your historical data is there, and that new data gets written as well (even after you reload the history page).

(I have not phrased those as answers but the explainations it potentially explains the issues @FlameBoy and @Nuuki have been going through)

3 Likes

Hi,

I am trying to get this migration to work. In order to minimize risks, I setup a copy of my HomeAssistant Instance that connects to Postgres and do the testing there. I have done a few runs, last one with the script mentioned in the last post.

Migration seems to work fine, for a few seconds, I can even see all the history, but then HomeAssistant starts some Schema Migration and the recorder crashes because it is trying to reinsert existent data into the migration_changes table. After this all historical data disappears (of course, recorder crashed!).

Before trying to dig into this more deeply, has anyone had this kind of errors and is there any obvious solution?

Update: this seems to be also the pk sequence issue. For some reason, pgloader often seems not to set the sequences for the PKs properly, so it has to be fixed manually.

If someone does not feel right using the Script from @GeoffreyFrogeye , her you can use these direct queries, however if HomeAssistant adds more tables in the future, this script might get outdated, but it should do the same… :wink:

BEGIN;
LOCK TABLE event_data IN EXCLUSIVE MODE;
SELECT setval('event_data_data_id_seq',(SELECT GREATEST(MAX(DATA_ID), nextval('event_data_data_id_seq')-1) FROM event_data));
LOCK TABLE event_types IN EXCLUSIVE MODE;
SELECT setval('event_types_event_type_id_seq',(SELECT GREATEST(MAX(EVENT_TYPE_ID), nextval('event_types_event_type_id_seq')-1) FROM event_types));
LOCK TABLE events IN EXCLUSIVE MODE;
SELECT setval('events_event_id_seq',(SELECT GREATEST(MAX(EVENT_ID), nextval('events_event_id_seq')-1) FROM events));
LOCK TABLE recorder_runs IN EXCLUSIVE MODE;
SELECT setval('recorder_runs_run_id_seq',(SELECT GREATEST(MAX(RUN_ID), nextval('recorder_runs_run_id_seq')-1) FROM recorder_runs));
LOCK TABLE schema_changes IN EXCLUSIVE MODE;
SELECT setval('schema_changes_change_id_seq',(SELECT GREATEST(MAX(CHANGE_ID), nextval('schema_changes_change_id_seq')-1) FROM schema_changes));
LOCK TABLE state_attributes IN EXCLUSIVE MODE;
SELECT setval('state_attributes_attributes_id_seq',(SELECT GREATEST(MAX(ATTRIBUTES_ID), nextval('state_attributes_attributes_id_seq')-1) FROM state_attributes));
LOCK TABLE states_meta IN EXCLUSIVE MODE;
SELECT setval('states_meta_metadata_id_seq',(SELECT GREATEST(MAX(METADATA_ID), nextval('states_meta_metadata_id_seq')-1) FROM states_meta));
LOCK TABLE states IN EXCLUSIVE MODE;
SELECT setval('states_state_id_seq',(SELECT GREATEST(MAX(STATE_ID), nextval('states_state_id_seq')-1) FROM states));
LOCK TABLE statistics IN EXCLUSIVE MODE;
SELECT setval('statistics_id_seq',(SELECT GREATEST(MAX(ID), nextval('statistics_id_seq')-1) FROM statistics));
LOCK TABLE statistics_meta IN EXCLUSIVE MODE;
SELECT setval('statistics_meta_id_seq',(SELECT GREATEST(MAX(ID), nextval('statistics_meta_id_seq')-1) FROM statistics_meta));
LOCK TABLE statistics_runs IN EXCLUSIVE MODE;
SELECT setval('statistics_runs_run_id_seq',(SELECT GREATEST(MAX(RUN_ID), nextval('statistics_runs_run_id_seq')-1) FROM statistics_runs));
LOCK TABLE statistics_short_term IN EXCLUSIVE MODE;
SELECT setval('statistics_short_term_id_seq',(SELECT GREATEST(MAX(ID), nextval('statistics_short_term_id_seq')-1) FROM statistics_short_term));
COMMIT;