Migrate to PostgreSQL

I’ve followed the instructions to migrate to SQLAlchemy. SQLite is fine, but I’m looking to move to PostgreSQL. Is there a mechanism to migrate data between SQLAlchemy backends?

Perhaps this helps: Converting from other Databases to PostgreSQL

There are to tables holding state changes, events and states.
The sensor values are in states.state contains sensor value and events.time_fired is the timestamp but what are actually the other timestamps in states-table?

                                       Table "public.events"
   Column   |           Type           |                         Modifiers
------------+--------------------------+-----------------------------------------------------------
 event_id   | integer                  | not null default nextval('events_event_id_seq'::regclass)
 event_type | character varying(32)    |
 event_data | text                     |
 origin     | character varying(32)    |
 time_fired | timestamp with time zone |
 created    | timestamp with time zone |
Indexes:
    "events_pkey" PRIMARY KEY, btree (event_id)
    "ix_events_event_type" btree (event_type)
Referenced by:
    TABLE "states" CONSTRAINT "states_event_id_fkey" FOREIGN KEY (event_id) REFERENCES events(event_id)

                                        Table "public.states"
    Column    |           Type           |                         Modifiers
--------------+--------------------------+-----------------------------------------------------------
 state_id     | integer                  | not null default nextval('states_state_id_seq'::regclass)
 domain       | character varying(64)    |
 entity_id    | character varying(255)   |
 state        | character varying(255)   |
 attributes   | text                     |
 event_id     | integer                  |
 last_changed | timestamp with time zone |
 last_updated | timestamp with time zone |
 created      | timestamp with time zone |
Indexes:
    "states_pkey" PRIMARY KEY, btree (state_id)
    "states__significant_changes" btree (domain, last_updated, entity_id)
    "states__state_changes" btree (last_changed, last_updated, entity_id)
Foreign-key constraints:
    "states_event_id_fkey" FOREIGN KEY (event_id) REFERENCES events(event_id)

states has timestamp columns last_changed, last_updated, created. I guess created is only technical value?
Created-column also show values that are -3 hours compared to others, so maybe there’s something wrong with it’s time zone code?

In history-component there’s code that uses both last_updated and last_changed.

    """Return states changes during UTC period start_time - end_time."""
    states = recorder.get_model('States')
    query = recorder.query('States').filter(
        (states.last_changed == states.last_updated) &
        (states.last_changed > start_time))

    if end_time is not None:
        query = query.filter(states.last_updated < end_time)

So actually the query is:

            states.last_changed == states.last_updated and
            states.last_changed > start_time and
            states.last_updated < end_time

Because last_changed = last_updated, the query can also written in a more easy way?

            states.last_changed == states.last_updated and
            states.last_changed > start_time and
            states.last_changed < end_time

Now it’s a little more intuitive?

I tried creation of partitions (daily or monthly) in Postgres but there wasn’t any gains, actually the opposite. I tested with 4000 rows/day, data for year = 1,5M rows. If you are purging old records, then it would make more sense to just drop old partitions.

Well, although this is a really, really old thread, maybe the following may be interesting for others with this problem. I just recently also migrated from the default sqlite database to postgresql. My procedure was as follows:

  1. create a postgresql database to be used by home assistant and make sure that the homeassistant user can access it and create tables in it
  2. write a config for pgloader:
load database
  from sqlite:///full/path/to/home-assistant_v2.db
  into postgresql:///home
with data only, drop indexes, reset sequences, truncate;
  1. reconfigure the recorder integration so it would be using postgresql. As I was already into this, I excluded some entities and events I’m not interested in.
  2. stop home assistant
  3. start home assistant, let it run for 20 seconds or so in order to let it create the necessary database tables and stop it again.
  4. run ‘pgloader’ with the configuration mentioned above (write it to a file, e.g. migration.config), then run pgloader migration.config This config deletes all entries HA might have written in the 20 seconds it was running before it copies the contents of the old sqlite database into the new, shiny postgresql database.
  5. restart home assistant and be happy

pgloader is a great tool for such conversions since it is fast and highly configurable. Nonetheless I advise against using it for also creating the database tables since it wouldn’t choose the same column types in postgresql home assistent does: the attributes column in postgresql will be of type text whereas in sqlite it was varchar.

5 Likes

I had to add

set timezone to ‘UTC’

to the pgloader config to get correct times.

Thank you @Jennes ! This was very helpful!

Don’t know is valid topic for question. I’ve already on Postgres. Want to know about use Postgres partitions for history tables. I want store all history (no purges). I disable auto purge in HA. But some SQL requests still start to slow down after a year of history. So question is simple:

  • Is it difficult to migrate to the partition model for history? (I experienced developer)
  • How migrations will behave after home-assistant updates?
  • Whether anyone else had a similar experience, I did not find (fast) any mentions this topic (maybe I badly searched)?
1 Like

I’m not using partitions, so I am no help in this vein. That said, I have been running postgresql as my core HA history and as a analytic db for going on two year now with very good success. I run HA and postgresql in docker containers on a 16 gb intel i7 device, a number of other docker containers running as well, I appear to be hardly taxing the box. DB size is 600 GB. I am running 2021.6.6 HA, so cannot speak to some of latest db changes.

I use simple triggers to make a copy of each new state and event record that HA adds, I let HA prune it’s dbs at 65 days and currently have over two years of data in my analytic dbs. In the post linked below is the setup and stored procedures I use to do this:

https://community.home-assistant.io/t/home-assistant-db-vs-mariadb/224295/4

There a several folks using the timescale layer on top of postgresql to give better time series abilities, I have yet to install this. if you search for either or both of ‘postgresql’ or ‘timescale’ you will get some good info.

Good hunting, I have very pleased with my use of postgresql with Home Assistant. I use a juypter container to do mosts of my analysis on postgresql.

1 Like

Thanks for answer! Really appreciate it. Will continue my research.

How does resetting the sequences work for you? I get heaps of unique constraint issues afterward.
Wouldn’t it make more sense to keep them?

I found this one through google: Migrate Home Assistant from SQLite to PostgreSQL | Sigfried Seldeslachts

Will go through it and report on my success … one of those days.

An old question, I know…

I have only been running home assistant for a few weeks so far, so my home assistant database have not grown much yet, but I am responsible for a postgreql state of health database at work that has been running for a few years and now have some gigs of data. There I installed the timescaledb (https://www.timescale.com/) extension and using the data compression it offers speeded things up quite a bit.

I have now gone through the blog post at Migrate Home Assistant from SQLite to PostgreSQL | Sigfried Seldeslachts and it … seems to work. I have the following comments to that guide:

  • Read through the guide a couple of times before actually performing the steps - at least if you want to minimize the hass downtime and data loss. For one thing, the pgloader took quite long time to install on archlinux.
  • The pgloader didn’t like the DB connector strings to be quoted, I had to remove the single quotes there.
  • The recorder integration did not want to start up after migrating the database (oups). Some problems with the primary key sequences it seems like. That’s a real bummer. I’m doing research now to see if I can find a workaround.

I’m using plain postgresql, but I should probably upgrade to timescaledb since it’s more optimal for this kind of data …

Ok, here is a workaround:

sudo -u postgres pg_dump  homeassistant_db | grep SELECT | grep _seq | perl -pe 's/, \d+, true/, 12345678, true/' | sudo -u postgres psql homeassistant_db

I’ve used Siegried’s guide to try to migrate from SQLite to PostgreSQL. I’m encountering an error that I have not seen mentioned anywhere else.

My SQLite database contains a “domain” field in the “states” table. This field does not get created by HA when I start HA with the recorder connected to PostgreSQL. And since the field is not created by HA the migration of the SQLite data fails.

I’m starting to wonder if this “domain” field has been abandoned in a recent update (haven’t been able to find any info on it) and if it should perhaps be dropped just like the events.created field that Siegfried already mentions in his post. I say recent since people seem to have used the script in March without much trouble, but the age of the install will of course matter as well. My install was created early-mid 2021, can’t remember exactly.

If anyone has a suggestion on a fix, or a path that can be explored I’d appreciate it.

I’m pretty sure it was.
I guess that the field was not dropped (I compared a “grand-fathered” db vs. a recent one, and the field is there in the old but not in the new) because it was not possible in sqlite up until recently.

Not really sure how to proceed.
To avoid headaches, I’d personally just add the field to PostgresQL as a quick win :wink:

I’ll need to take a deeper dive into this if I’m going to get it to work. Without the “domains” (and “created” btw) fields the migration runs, but it has never managed to run completely through correctly. It stopped with a “Killed” message last time, restarting the migration seems to get it to complete but I have no way of knowing if the migration has actually completed correctly.

I’d did test HA towards the completed migration and got the following errors:
Logger: homeassistant.components.recorder.auto_repairs.schema
Source: components/recorder/auto_repairs/schema.py:161
Integration: Recorder (documentation, issues)
First occurred: 10:36:35 (8 occurrences)
Last logged: 10:36:35

Column min in database table statistics does not support double precision (stored=None != expected=1.000000000000001)
Column max in database table statistics does not support double precision (stored=None != expected=1.000000000000001)
Column last_reset_ts in database table statistics does not support double precision (stored=0.0 != expected=1.000000000000001)
Column state in database table statistics does not support double precision (stored=4353.355 != expected=1.000000000000001)
Column sum in database table statistics does not support double precision (stored=0.0 != expected=1.000000000000001)

I currently don’t know if those errors are a result of the process being killed or if it’s a migration error. Now that I think about it those columns should be created by HA itself, not the migration.

This migration has turned out to be far more complicated than I expected. I’m not a DBA, but I’ve worked with databases, so I thought this should be manageable given the instructions available. I’ll try to post an update if I get it to work.

@FlameBoy Did you ever manage to complete the migration?

I followed the guides, but encountered a bunch of type errors:

2023-10-19T13:54:34.245002Z WARNING Source column "public"."statistics_meta"."id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."statistics_meta"."id".
2023-10-19T13:54:34.245002Z WARNING Source column "public"."statistics_meta"."statistic_id" is casted to type "text" which is not the same as "character varying", the type of current target database column "public"."statistics_meta"."statistic_id".
2023-10-19T13:54:34.245002Z WARNING Source column "public"."statistics_meta"."source" is casted to type "text" which is not the same as "character varying", the type of current target database column "public"."statistics_meta"."source".
2023-10-19T13:54:34.245002Z WARNING Source column "public"."statistics_meta"."unit_of_measurement" is casted to type "text" which is not the same as "character varying", the type of current target database column "public"."statistics_meta"."unit_of_measurement".
2023-10-19T13:54:34.245002Z WARNING Source column "public"."statistics_meta"."name" is casted to type "text" which is not the same as "character varying", the type of current target database column "public"."statistics_meta"."name".
2023-10-19T13:54:34.245002Z WARNING Source column "public"."recorder_runs"."run_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."recorder_runs"."run_id".
2023-10-19T13:54:34.245002Z WARNING Source column "public"."schema_changes"."change_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."schema_changes"."change_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."schema_changes"."schema_version" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."schema_changes"."schema_version".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."statistics"."id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."statistics"."id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."statistics"."metadata_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."statistics"."metadata_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."statistics_runs"."run_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."statistics_runs"."run_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."statistics_short_term"."id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."statistics_short_term"."id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."statistics_short_term"."metadata_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."statistics_short_term"."metadata_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."state_attributes"."attributes_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."state_attributes"."attributes_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."event_data"."data_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."event_data"."data_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."event_types"."event_type_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."event_types"."event_type_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."event_types"."event_type" is casted to type "text" which is not the same as "character varying", the type of current target database column "public"."event_types"."event_type".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."states_meta"."metadata_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."states_meta"."metadata_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."states_meta"."entity_id" is casted to type "text" which is not the same as "character varying", the type of current target database column "public"."states_meta"."entity_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."events"."event_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."events"."event_id".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."events"."event_type" is casted to type "text" which is not the same as "character", the type of current target database column "public"."events"."event_type".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."events"."event_data" is casted to type "text" which is not the same as "character", the type of current target database column "public"."events"."event_data".
2023-10-19T13:54:34.246002Z WARNING Source column "public"."events"."origin" is casted to type "text" which is not the same as "character", the type of current target database column "public"."events"."origin".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."events"."context_id" is casted to type "text" which is not the same as "character", the type of current target database column "public"."events"."context_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."events"."context_user_id" is casted to type "text" which is not the same as "character", the type of current target database column "public"."events"."context_user_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."events"."context_parent_id" is casted to type "text" which is not the same as "character", the type of current target database column "public"."events"."context_parent_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."events"."data_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."events"."data_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."events"."origin_idx" is casted to type "bigint" which is not the same as "smallint", the type of current target database column "public"."events"."origin_idx".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."events"."event_type_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."events"."event_type_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."state_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."states"."state_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."entity_id" is casted to type "text" which is not the same as "character", the type of current target database column "public"."states"."entity_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."state" is casted to type "text" which is not the same as "character varying", the type of current target database column "public"."states"."state".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."attributes" is casted to type "text" which is not the same as "character", the type of current target database column "public"."states"."attributes".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."event_id" is casted to type "bigint" which is not the same as "smallint", the type of current target database column "public"."states"."event_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."old_state_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."states"."old_state_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."attributes_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."states"."attributes_id".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."origin_idx" is casted to type "bigint" which is not the same as "smallint", the type of current target database column "public"."states"."origin_idx".
2023-10-19T13:54:34.247002Z WARNING Source column "public"."states"."context_id" is casted to type "text" which is not the same as "character", the type of current target database column "public"."states"."context_id".
2023-10-19T13:54:34.248002Z WARNING Source column "public"."states"."context_user_id" is casted to type "text" which is not the same as "character", the type of current target database column "public"."states"."context_user_id".
2023-10-19T13:54:34.248002Z WARNING Source column "public"."states"."context_parent_id" is casted to type "text" which is not the same as "character", the type of current target database column "public"."states"."context_parent_id".
2023-10-19T13:54:34.248002Z WARNING Source column "public"."states"."metadata_id" is casted to type "bigint" which is not the same as "integer", the type of current target database column "public"."states"."metadata_id".

I allowed HA to create the scheme in Postgres so that should be correct, and I used the same pgloader script everyone else seems to have working, so not sure what’s going wrong in my case. I dropped the fields that aren’t in the new schema, but I doubt that’s the cause of these issues.

I’m stumped…