Recorder remote database options

baz123 · November 13, 2020, 8:10am

Recorder is taking an age to start (large database) and the size contributes to very large backup files. I know I can reduce the size of that by selecting what is stored, but I’m considering completely offloading the database to a Proxmox container.

Question is, which database flavour to use? I know little about database architecture but the 3 contenders seem to be

PostgreSQL
MariaDB
InfluxDB

Any suggestions on which would be the best solution?

Does the Base OS used for the container make any difference?

tom_l · November 13, 2020, 9:17am

I don’t think you can use InfluxDB.

dproffer · November 13, 2020, 7:45pm

I have been using Postgresql for 2 years with HA and find it very solid. I run it in a docker container beside my HA in docker container on Ubuntu 20, not too hard to install and configure. Get it running and yourself comfortable with starting, stopping, querying, exports then cut HA over to using it. It does a very nice job of extracting JSON attributes out of HA’s recorder records. The free version of this GUI query tool does a nice job of working with Postgresql:
https://tableplus.com/

baz123 · December 29, 2020, 6:11am

I have now tried to setup a PostgreSQL instance but HA is raising errors when it tried to load any sensor data.

Have you seen this? PostgreSQL Error - Key (old_state_id) is not present in table sensor entities

fversteegen · December 29, 2020, 6:28am

I used MariaDB for a while which worked great, but reverted back to the default included DB format as I do not care too much about it. Long term data gets writted to an Influxdb which is running for about 5 years now. I regularly flush the internal database.

baz123 · December 29, 2020, 6:58am

It isn’t for long term storage, it is about efficiency. Using the Standard setup, Recorder is really slow to start. By offloading the database to a separate system, startup is really quick.

The PostgreSQL instance is running on a ProxmoxVE LXC on an old laptop.

dproffer · December 29, 2020, 1:20pm

No, not seen such a message. The message seems to reference a table I do not have. I have only these tables as created by HA in Postgresql database homeassistant:
events
recorder_runs
schema_changes
states

There was a big database change back a couple releases ago, changing what was stored in the events table.

A pain to do, but is it possible for you to temporarily switch to a clean Postgresql VM and let HA build it’s tables from zero. Then see if you get errors. If so, I would think there is something amiss with the Postgresql image you are using. I am using Postgresql docker image created by postgresql people on docker hub. I use version :
“PostgreSQL 12.3 (Debian 12.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit”

I have 54 million records in my states table for 65 days of history. I let HA purge the database using it’s standard function. But I also archive all my history to Postgresql archive tables. These are what I mostly query, leaving the main states and events table to HA’s use. But I can query these as well externallly with no issues.

baz123 · December 29, 2020, 8:27pm

Firstly, this only happens for the sensor domain - all other domains are fine, so my feeling is it is an error in the SQL statement compared to the SQL built for the other domains; but I have no idea how that happens.

That is exactly what I did - it was a fresh PostgreSQL instance - no database migration at all. This is actually the issue I think. Where there is no old_state for a sensor item (because this is the first time that sensor has been included) it errors. Unless it is tested with previously running system and a fresh database, you would not notice the error.

I’m not using an image, it is a PVE LXC built on a debian template with PostgreSQL installed via apt.

This is what I see from PGAdmin

In the Schema I see the same as you.

baz123 · December 29, 2020, 8:32pm

I’m very new to PostgreSQL - rudimentary understanding of SQL.

Any tips on how to use archive tables combined with the HA purge?

65 days of history is simply only possible with an external instance of the database

dproffer · December 29, 2020, 11:52pm

Well that is perplexing @baz123 !
Might be a bug in HA with it’s Postgresql integration. I am pretty sure the default ‘fix’ when your SQLite DB gets corrupt is to simply shut down HA, delete the database file and restart HA. So if that in fact works, HA does not care about the previous ‘old_state_id’ values not being available when it build a new SQLite DB.
I see you posted a bug report in github, that is probably a good path to an answer. Reading the more detailed info you posted over there. I used psql to look at my ‘states’ and ‘events’ tables and I do not see a constraint named ‘states_old_state_id_fkey’ on either table. See below. You could try dropping the constraint with the command below in psql, sounds like you are comfortable (unfortunately) with rebuilding your HA Recorder in Postgresql, so if it messes things up more, just start again with a clean Postgresql homeassistant database:

ALTER TABLE public.states DROP CONSTRAINT states_old_state_id_fkey;

192.168.2.242 user@homeassistant=> \d+ states
                                                                Table "public.states"
{...}
Indexes:
    "states_pkey" PRIMARY KEY, btree (state_id)
    "ix_states_entity_id_last_updated" btree (entity_id, last_updated)
    "ix_states_event_id" btree (event_id)
    "ix_states_last_updated" btree (last_updated)
Foreign-key constraints:
    "states_event_id_fkey" FOREIGN KEY (event_id) REFERENCES events(event_id)
Triggers:
    trigger_on_states BEFORE INSERT ON states FOR EACH ROW EXECUTE FUNCTION trigger_on_states()
Access method: heap

192.168.2.242 user@homeassistant=> \d+ events
                                                                Table "public.events"
{...}
Indexes:
    "events_pkey" PRIMARY KEY, btree (event_id)
    "ix_events_context_id" btree (context_id)
    "ix_events_context_parent_id" btree (context_parent_id)
    "ix_events_context_user_id" btree (context_user_id)
    "ix_events_event_type_time_fired" btree (event_type, time_fired)
    "ix_events_time_fired" btree (time_fired)
Referenced by:
    TABLE "states" CONSTRAINT "states_event_id_fkey" FOREIGN KEY (event_id) REFERENCES events(event_id)
Triggers:
    trigger_on_events BEFORE INSERT ON events FOR EACH ROW EXECUTE FUNCTION trigger_on_events()
Access method: heap

dproffer · December 30, 2020, 12:23am

Hi @baz123, Below are the steps I used to create and manage the two archive tables I keep. One for states and one for events. I have not used the events archive for anything, but that might be due to the way I use and think about Home Assistant, YMMV.

I started with Influxdb and Grafana, both very powerful. But it was another pair of systems to learn, manage and keep current on. I was already using Python, Pandas, Matplotlib and other iPython, Jupyter tools. So I’ve been very happy with less tools. A lot of folks have great success with Influxdb. Have a look at the Home Assistant iPython/Jupyter notebooks here and see if you can adapt them to reading from your Postgresql tables. This is how I got starting with my analytics:


https://data.home-assistant.io/


postgresql home assistant archive trigger setup
202007191400

created tables, procedures and triggers to save records from HA states and events tables to archive tables

psql -h 192.168.2.242 -d homeassistant -U user -W

--------------------
-- states Analytics Table Definition, create the archive table
CREATE TABLE "public"."states_archive" (
    "state_id" int4,
    "domain" varchar(64),
    "entity_id" varchar(255),
    "state" varchar(255),
    "attributes" text,
    "event_id" int4,
    "last_changed" timestamptz,
    "last_updated" timestamptz,
    "created" timestamptz,
    "old_state_id" int4
);

--------------------
-- Function to insert into analytics table the the row being added to states table
create or replace function trigger_on_states()
returns trigger language plpgsql
as $function$
begin

    insert into states_archive
    select new.*;
    return new;
end; $function$;

--------------------
-- Trigger that is fired on each insert to states table
create trigger trigger_on_states
  before insert
  on states
  for each row
  execute procedure trigger_on_states();

--------------------
-- events Analytics Table Definition, create the archive table
CREATE TABLE "public"."events_archive" (
    "event_id" int4,
    "event_type" varchar(32),
    "event_data" text,
    "origin" varchar(32),
    "time_fired" timestamptz,
    "created" timestamptz,
    "context_id" varchar(36),
    "context_user_id" varchar(36),
    "context_parent_id" varchar(36)
);

--------------------
-- Function to insert into analytics table the the row being added to events table
create or replace function trigger_on_events()
returns trigger language plpgsql
as $function$
begin

    insert into events_archive
    select new.*;
    return new;
end; $function$;

--------------------
-- Trigger that is fired on each insert to events table
create trigger trigger_on_events
  before insert
  on events
  for each row
  execute procedure trigger_on_events();

--------------------
-- create a single index in archive tables on the 'last_updated' column, seems the most useful
create index ix_states_archive_last_updated on states_archive using btree (last_updated DESC);
-- took maybe 5 minutes
create index ix_events_archive_time_fired on events_archive using btree (time_fired DESC);
-- took under 10 minutes to create

--------------------
-- create nice view of states archive
DROP VIEW IF EXISTS states_archive_01;
CREATE VIEW states_archive_01 AS
SELECT
    last_updated,
    entity_id,
    state,
    state_id,
    attributes
FROM
    states_archive
ORDER BY
    last_updated DESC;


--------------------
-- To stop the triggers and copying of records to the archive tables, just delete the two triggers
drop trigger if exists trigger_on_states on states;
drop trigger if exists trigger_on_events on events;
-- To restart the copying of the records to the archive tables, just recreate the triggers
-- see above

baz123 · December 30, 2020, 10:29am

Thanks. I’ll have a play sometime.

Horses for courses. I use InfluxDB for timeseries data; PostgreSQL is a better choice for this sort of data generally. You could argue that some sensor data is more suited to timeseries (temp, humidity etc) but presence detectors unless they report regularly, less so.

Thanks for your help.

baz123 · December 30, 2020, 5:20pm

I updated to 2020.12.2 and everytime I start I get 2 exceptions (different sensors) then all is fine. I have even reenabled storing the sensor domain.

Odd

dproffer · December 30, 2020, 8:15pm

Odd… well yes and no I guess. But @baz123 happy you seem to be up and running! This all is far above my pay grade, but it does seem like the developer were poking around issues that seem related to your issue. This first post about Postgresql and second about marina seems to show there were problems with the database scheme change I noted in one of prior posts. I am postulating that I did not have problems since I was on Postgresql prior to this change in schema and you unfortunately were moving to Postgresql right in the middle of various versions treating these relation conditions differently…

https://github.com/home-assistant/core/issues/40115

https://github.com/home-assistant/core/issues/42402

It is becoming more and more difficult to ‘roll the dice’ on upgrades. I do not have a fully isolated test environment for my ‘home’ nor do I really want to put the effort and money into having something were I can test a HA upgrade before putting it into production. So for now, I use the ‘sit back and watch other suffer and fix’ method . However, as this project grows in dimensions, the chances of a screwing up my database history, or corrupting my install to where it will be a real pain to recover becomes a much greater possibility.

The ‘breaking changes’ is a must read, but I do not think that has complete coverage. For example, there was a database schema change when I upgrade from 0.118 to 2020.12.1, but I have yet to find what changed. And luckily it caused no issues for my little ‘tweak’ to the system to copy the records to my archive database.

To your point about InfluxDB and Postgresql, I agree that having a time series tool to analyze sensor data in HA is a key tool if you want to ‘operate’ your home effectively. Combined with how we started this conversation, if you cannot look back over, I think, at least two years of your home’s data, you are just guessing at what the effect of lighting, insulation, solar, HVAC changes make.

I again rationalized my move out of InfluxDB to Postgresql, partly because Postgresql is so extensible. There is a time series layer call Timescale that seems to give Postgresql much if not all of InfluxDB’s traits and abilities. Especially for the size of data that a home generates, we are not even close to pushing either of these tools. I have not installed Timescale as yet, mostly because I have been able to solve 100% of questions to date with just Postgresql, Pandas, Juypter stack. Yes, I have had queries/calculations that have run 30 plus minutes, but I am not time constrained. And in these COVID times, that means a couple more ’s or ’s !!!

Interested in your thoughts on sensor analysis and tooling that is useful for home automation.

My other point about having to keep up on many tools as another reason why I moved away from InfluxDB. This comparisons of queries of TimescaleDB vs. InfluxDB in this article highlights my past head scratching and googling when trying to use InfluxDB:

https://medium.com/timescale/timescaledb-vs-influxdb-for-time-series-data-timescale-influx-sql-nosql-36489299877

Good hunting and Happy New Year!

baz123 · January 1, 2021, 8:25pm

I’ll agree with that - I never install an x.x.0 and usually wait to the end of the cycle before updating.

I’m in the same club.

Amen.

Again in total agreement.

For any analysis I am doing I am using the Emoncms tool from the OpenEnergyMonitor group. However, it is very simplistic and not as complex as what you are looking at I suspect.

I have to be honest and say I am scratching the surface of these tools for analysis. My prime driver was to make the startup quicker as recorder was taking an age to initialise.

To the problem at hand - the GitHub issue has an update. It looks like an encoding issue. If you look at the recorder docs, the MSQL databases are using utf8mb4 and by default PostgreSQL uses SQL_ASCII. It is suggested to add the client encoding to the db_url and that seems to have stopped the errors, but I am not sure it is the whole story. I need to drop the existing database and recreate it as utf8 to see what happens.

dproffer · January 1, 2021, 10:55pm

Regardless of my opins, a real bummer that you are having so much trouble with using Postgresql as HA Recorder. Part of my being such a fan of this for HA is that I have had ZERO issues with this combo in 2 years of using…

As I said, I am not well enough versed to really understand, however I am ‘experienced enough’ as a ‘old guy’, to raise an eyebrow. Looks like the second set of errors you posted is a very different issue than the original one. Which see below, someone else is now struggling with. I think someone added this constraint without thinking through it.

As to your new issue, I remember running across something similar when I was trying something with unicode, and the issue was I was using the wrong conversion functions. It is a python2 vs. python3 thing. See this stack overflow:

https://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20

Basically, stop using str to convert from unicode to encoded text / bytes.

Instead, properly use .encode() to encode the string:
p.agent_info = u' '.join((agent_contact, agent_telno)).encode('utf-8').strip()

Have a look at this issue:

github.com/home-assistant/core

The `recorder.states` table does not have index on `old_state_id` column

opened 03:40PM - 01 Jan 21 UTC

closed 09:51AM - 04 Jan 21 UTC

ayufan

integration: recorder

## The problem I have serious problems removing data from PSQL with `recorder…:`. It appears that this is due to amount of records stored, and size of data, it simply takes ages to execute: ```sql hass=# select count(*) from events; count --------- 3614166 (1 row) hass=# select count(*) from states; count --------- 3594602 (1 row) ``` I looked at indexes of the questioned tables, and they current look following: ```sql hass=# \d states; Table "public.states" Column | Type | Collation | Nullable | Default --------------+--------------------------+-----------+----------+------------------------------------------ state_id | integer | | not null | nextval('states_state_id_seq'::regclass) domain | character varying(64) | | | entity_id | character varying(255) | | | state | character varying(255) | | | attributes | text | | | event_id | integer | | | last_changed | timestamp with time zone | | | last_updated | timestamp with time zone | | | created | timestamp with time zone | | | old_state_id | integer | | | Indexes: "states_pkey" PRIMARY KEY, btree (state_id) "ix_states_entity_id_last_updated" btree (entity_id, last_updated) "ix_states_event_id" btree (event_id) "ix_states_last_updated" btree (last_updated) Foreign-key constraints: "states_event_id_fkey" FOREIGN KEY (event_id) REFERENCES events(event_id) "states_old_state_id_fkey" FOREIGN KEY (old_state_id) REFERENCES states(state_id) Referenced by: TABLE "states" CONSTRAINT "states_old_state_id_fkey" FOREIGN KEY (old_state_id) REFERENCES states(state_id) hass=# \d events; Table "public.events" Column | Type | Collation | Nullable | Default -------------------+--------------------------+-----------+----------+------------------------------------------ event_id | integer | | not null | nextval('events_event_id_seq'::regclass) event_type | character varying(32) | | | event_data | text | | | origin | character varying(32) | | | time_fired | timestamp with time zone | | | created | timestamp with time zone | | | context_id | character varying(36) | | | context_user_id | character varying(36) | | | context_parent_id | character varying(36) | | | Indexes: "events_pkey" PRIMARY KEY, btree (event_id) "ix_events_context_id" btree (context_id) "ix_events_context_parent_id" btree (context_parent_id) "ix_events_context_user_id" btree (context_user_id) "ix_events_event_type_time_fired" btree (event_type, time_fired) "ix_events_time_fired" btree (time_fired) Referenced by: TABLE "states" CONSTRAINT "states_event_id_fkey" FOREIGN KEY (event_id) REFERENCES events(event_id) ``` Looking at data structure it appears the following: - `states.old_state_id` has FK, but lacks an index causing the need to load the whole table on the object removal due to FK - `states.event_id` and `states.old_state_id` FK is a constraint validation, but does not have `ON DELETE SET NULL` set which results in blocking `DELETEs` from being executed The above seems to prevent timely removal of data, causing extensive time to run `DELETE`, second crashing this `DELETE` due to FK violations. I run the following commands to remedy and alter DB structure, and after doing this deletes are successfully executed. ```sql create index tmp_ix_states_old_state_id ON states (old_state_id) WHERE old_state_id IS NOT NULL; alter table states drop constraint "states_event_id_fkey"; alter table states drop constraint "states_old_state_id_fkey"; alter table states add constraint "states_event_id_fkey" FOREIGN KEY (event_id) REFERENCES events(event_id) ON DELETE SET NULL; alter table states add constraint "states_old_state_id_fkey" FOREIGN KEY (old_state_id) REFERENCES states(state_id) ON DELETE SET NULL; ``` > Note: This likely affects any other DB than PSQL as well ## Environment - Home Assistant Core release with the issue: Home Assistant 2020.12.1 - Last working Home Assistant Core release (if known): - Operating environment (OS/Container/Supervised/Core): Ubuntu 20.04 running on `armv7l` via official Docker Image - Integration causing this issue: - Link to integration documentation on our website: ## Problem-relevant `configuration.yaml`  ```yaml ``` ## Traceback/Error logs  ```txt ``` ## Additional information

Foreign-key constraints:
....
"states_old_state_id_fkey" FOREIGN KEY (old_state_id) REFERENCES states(state_id)

My connect string in configuration.yaml from day one of moving to Postgresql:
postgresql://abc:[email protected]/homeassistant

My database encodings:

baz123 · January 2, 2021, 1:38pm

Interestingly;

postgres=# \l
                               List of databases
     Name      |  Owner   | Encoding  | Collate | Ctype |   Access privileges   
---------------+----------+-----------+---------+-------+-----------------------
 homeassistant | postgres | SQL_ASCII | C       | C     | 
 postgres      | postgres | SQL_ASCII | C       | C     | 
 template0     | postgres | SQL_ASCII | C       | C     | =c/postgres          +
               |          |           |         |       | postgres=CTc/postgres
 template1     | postgres | SQL_ASCII | C       | C     | =c/postgres          +
               |          |           |         |       | postgres=CTc/postgres
(4 rows)

So it seems that installing from apt, the default is SQL_ASCII - perhaps that is an oddity of using the Docker image. Certainly the docs say this is the PostgreSQL default encoding.

I also discovered that the locale was set toC on the container so that may have something to do with it as well.

I think perhaps I missed the subtlety of the issue.

Whilst the use of the db_url

db_url: postgresql://homeassistant:[email protected]/homeassistant?client_encoding=utf8

Seemed to stop the errors, doing this is not a recommended way.

I therefore dropped the database, and recreated it with UTF8 as the encoding. Restarted HA and all seems well.

postgres@psql:~$ dropdb homeassistant
postgres@psql:~$ createdb -E utf8 homeassistant

I did also find this method to modify the database encoding, but as there was only a few days in it and I wanted to test a fresh database, I did not bother doing this.

I’m going to update the docs.

dproffer · January 2, 2021, 6:42pm

Again, happy and sad, that you are finding help and solutions. Sad for your problems, but also 'cause I am shaking in my boots seeing the changes that have occurred to remote recorder that have not flowed through to my config. And I will have to figure out how to weave these into my setup without ‘much’ pain. The whole move to unicode from our current world of many local dialects continues to be another thorn.

I admit, I am still trying to figure out how to use an emoji as a variable name that I am typing on my punch cards for my COBOL program grin.

Gotta love this quote from the Postgresql docs about SQL_ASCII !:

‘…Thus, this setting is not so much a declaration that a specific encoding is in use, as a declaration of ignorance about the encoding…’

‘The SQL_ASCII setting behaves considerably differently from the other settings. When the server character set is SQL_ASCII, the server interprets byte values 0–127 according to the ASCII standard, while byte values 128–255 are taken as uninterpreted characters. No encoding conversion will be done when the setting is SQL_ASCII. Thus, this setting is not so much a declaration that a specific encoding is in use, as a declaration of ignorance about the encoding. In most cases, if you are working with any non-ASCII data, it is unwise to use the SQL_ASCII setting because PostgreSQL will be unable to help you by converting or validating non-ASCII characters.’

baz123 · January 2, 2021, 6:46pm

Yes I read that in the docs .

The apt install pulled in 12 so that is what I’m using. I think that if I’d been directed to create the database as utf8 it would not have been an issue. That is the change I have proposed for the docs.

baz123 · January 10, 2021, 3:40pm

@dproffer, have you had any issues with the returned states being rendered?

I can see the requests to the database being returned in the HA logs, but in some cases the result is not rendered

History page will currently not render at all. It was all fine when I set it up.