Comments on your points and questions.
Good that your ‘daily driver’ is Postgresql, and you have the time and interest to explore Influx also. I just find it hard to pave a whole new street when I can expand my lane. Especially when the coin we are working for at home is usually the significant other acceptance factor
You might have a look at this time series database layer, Timescale, on top of Postgresql. Search this forum to see folks experience with using it with Home Assistant and have a read of the blog post cited at end of post.
I would say that for me, Postgresql and my history setup meets the key assumptions and benefits you cite.
The 65 day retention period for HA’s Recorder is managed by HA’s history setting. I do nothing, HA keeps its database size. I’ve never had to Postgresql VACUUM any of my databases, the automated Postgresql clean up functions have done all the work for 3 years.
I do not use SQLite for parts of this system, so I’m not sure what you are asking about vacuuming SQLite DB.
I have posted a basic set of instructions to create the needed table, trigger and function for each HA table that you want to create this for. Hope it is readable, if not give a shout. I’m sure there are more optimized ways to accomplish this, however what I coded seems to perform without issue. My HA instance is adding about 100,000 records in total to it’s events and states tables per hour. The trigger copy functions keep up this just fine on my Intel i7 machine where both HA and Postgresql run, along with about 20 other docker containers.
I do not use any HA integrations for this system, I just run Postgresql in a docker container and point the db url at it in my HA configuration.yaml , and then do these steps cited to create the trigger copy functions.
I’m not clear on your statement ‘but first data has to be inserted to new…’. I am pretty sure I just create an empty table with columns and indexes I want. No records in the table. Then I add my functions and triggers for the two tables, events and states. And off it goes. You could do processing in the function that is called on each insert trigger, but it seems better to keep this path lean. Just get the records copied over, then mess around.
As far as feeding data back to HA from this system, I use the Home Assistant SQL Sensor, AppDameon and Grafana. However, the bulk of my use so far has been outside HA to do analytics via Jupyter notebooks, python scripts in cron jobs.
I have not yet explored the newer summary/statistics tables that HA is adding. I get my instantaneous energy usage via a zigbee smartmeter and various energy monitoring wall sockets and plugs. And again, even though this data is in HA, so far I just analyze it using Jupyter outside HA. I do bring back one year and two year ago data to HA for some presentation in HA’s Lovelace.
Good luck, hope you find a solution that works as well for you as this has been for me.
postgresql home assistant archive trigger setup
202007191400
created tables, procedures and triggers to save records from HA states and events tables to archive tables
psql -h 192.168.1.100 -d homeassistant -U hau -W
--------------------
-- states Analytics Table Definition, create the archive table
CREATE TABLE "public"."states_archive" (
"state_id" int4,
"domain" varchar(64),
"entity_id" varchar(255),
"state" varchar(255),
"attributes" text,
"event_id" int4,
"last_changed" timestamptz,
"last_updated" timestamptz,
"created" timestamptz,
"old_state_id" int4
);
--------------------
-- Function to insert into analytics table the the row being added to states table
create or replace function trigger_on_states()
returns trigger language plpgsql
as $function$
begin
insert into states_archive
select new.*;
return new;
end; $function$;
--------------------
-- Trigger that is fired on each insert to states table
create trigger trigger_on_states
before insert
on states
for each row
execute procedure trigger_on_states();
--------------------
-- events Analytics Table Definition, create the archive table
CREATE TABLE "public"."events_archive" (
"event_id" int4,
"event_type" varchar(32),
"event_data" text,
"origin" varchar(32),
"time_fired" timestamptz,
"created" timestamptz,
"context_id" varchar(36),
"context_user_id" varchar(36),
"context_parent_id" varchar(36)
);
--------------------
-- Function to insert into analytics table the the row being added to events table
create or replace function trigger_on_events()
returns trigger language plpgsql
as $function$
begin
insert into events_archive
select new.*;
return new;
end; $function$;
--------------------
-- Trigger that is fired on each insert to events table
create trigger trigger_on_events
before insert
on events
for each row
execute procedure trigger_on_events();
--------------------
-- create a single index in archive tables on the 'last_updated' column, seems the most useful
create index ix_states_archive_last_updated on states_archive using btree (last_updated DESC);
-- took maybe 5 minutes
create index ix_events_archive_time_fired on events_archive using btree (time_fired DESC);
-- took under 10 minutes to create
--------------------
-- create nice view of states archive
DROP VIEW IF EXISTS states_archive_01;
CREATE VIEW states_archive_01 AS
SELECT
last_updated,
entity_id,
state,
state_id,
attributes
FROM
states_archive
ORDER BY
last_updated DESC;
--------------------
-- To stop the triggers and copying of records to the archive tables, just delete the two triggers
drop trigger if exists trigger_on_states on states;
drop trigger if exists trigger_on_events on events;
-- To restart the copying of the records to the archive tables, just recreate the triggers
-- see above