Home assistant db vs MariaDB

stuiow · September 4, 2020, 6:54am

I have a question regarding the database options for home assistant.
I did read that influxdb speeds up home assistant, however, since then, the way HA uses its .db file has changed and that has made things quicker.
I want your opinions.

I have loads of stuff on my HA instance, 130+ z-wave nodes, rfxtrx thermometers, unifi integration.

What would you use to store your data? Thinking about efficiency of space taken up resources it takes.
I would like HA as slick as possbile with graphs loads fairly quickly.

tom_l · September 4, 2020, 7:18am

Are you sure it was iniflux and not MariaDB?

You can’t use InfluxDB as a home assistant database, it does not have an SQL backend.

My strategy (there are many options):

Use MariaDB to store a weeks worth of data for home assistant, using lots of excludes to only store information that I am interested in short term.
Use InfluxDB to store long term (two years) data for graphing, using includes to only store information that I am interested in long term.

Prior to the recent optimisations my DB was about 1.6GB for a week, and I had many excludes.

After the optimisations it is averaging about 350MB. However I also systematically organised my excludes.

I estimate that two yeas of InfluxDB data that I’m interested in will run about 1.5GB.

stuiow · September 4, 2020, 7:17pm

You are correct, it is mariaDB, apologies.

I think i’ll move back over to MariaDB, using optimsations you have carried out yourself.

dproffer · September 4, 2020, 7:22pm

An alternative path you could consider:

I ran influx/grafana as a long term storage and analysis system with HA for a while. It is a powerful combo, but I found it another thing to learn, stay current on and keep updated. I decided I wanted to stay in Python world for as much as I could.

I migrated away from native SQLite to Postgresql to get a more reliable core database. Then I created two simple Postgresql procedures and triggers that on every insert to the events and states tables also inserts these records into parallel analytics copy of the events and states tables. Very easy to write the code from examples and I see no performance issues in HA. I stripped the shadow database of all constraints and unnecessary indexes.

Now I can let HA keep the history in its database small and still have realtime current analytics database with as long a history as I want with no effect on performance of HA and its databases.

I use Jupyter to do my analytics, with allows me to stay in Python world use all of the analytics of pandas, matplotlib and more python tools. The HA team have created some nice examples of using a Jupyter Notebook for analytics. The Jupyter team curates a set of Docker images that give you an easy to use Python analytics platform. But you can also query the analytics tables from within HA for building sensors and dashboards in HA. But be careful of overloading HA if you have very long and complex queries, since these execute within HA space.

Docker Jupyter : https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html

You should be able to create a similar system with any database that will fire a trigger on the insert of a record to the events or states tables.

Here is an example for Postgresql:

-- Analytics Table Definition
CREATE TABLE "public"."states_archive" (
    "state_id" int4,
    "domain" varchar(64),
    "entity_id" varchar(255),
    "state" varchar(255),
    "attributes" text,
    "event_id" int4,
    "last_changed" timestamptz,
    "last_updated" timestamptz,
    "created" timestamptz,
    "old_state_id" int4
);

-- Function to insert into analytics table the the row being added to states table
create or replace function trigger_on_states()
returns trigger language plpgsql
as $function$
begin
    insert into states_archive
    select new.*;
    return new;
end; $function$;

-- Trigger that is fired on each insert to states table
create trigger trigger_on_states
  before insert
  on states
  for each row
  execute procedure trigger_on_states();

-- Create useful indexes for analytics
create index ix_states_archive_last_updated on states_archive using btree (last_updated DESC);

RedWheiler · February 23, 2021, 12:46am

This is the solution I’m looking for I think.

HA running on a Pi is sending my database/recorder to MariaDB/phpMyadmin at my Synology this works fine for this moment. I keep 32 days of information… Would love to get this back to like 10 and let InfluxDB take-over.

In the meanwhile both Grafana as InfluxDB are installed on my Synology and are compatible.

But I’m having trouble to get my MariaDB to be integrated in InfluxDB. Probably I’m overlooking something stupid… Any help would be appreciated.

Edit: re-read your post… Probably you are sending from HA both to Influxdb and MariaDB. Probably that will be more easy.

EdAlK · April 14, 2022, 9:59am

Tom, how did you get that graph of databse sizes? can you share the query
thanks
Allan

tom_l · April 14, 2022, 1:06pm

It’s three:

SELECT max("value") FROM "MB" WHERE ("entity_id" = 'database_size') AND $timeFilter GROUP BY time(1d) fill(null) tz('Australia/Hobart')

SELECT mean("value") FROM "MB" WHERE ("entity_id" = 'database_size') AND $timeFilter GROUP BY time(1d) fill(null) tz('Australia/Hobart')

SELECT min("value") FROM "MB" WHERE ("entity_id" = 'database_size') AND $timeFilter GROUP BY time(1d) fill(null) tz('Australia/Hobart')

EdAlK · April 14, 2022, 2:27pm

thank you. I’ll giive it a go

tom_l · September 27, 2022, 8:47am

Ha. I was way off on that estimate. 3.75GB and no sign of slowing down.

Mainly because I’m storing long term data for a lot more entities.

charlesbaynham · July 22, 2023, 5:47pm

I found this post really helpful and have done the same on my setup. After the new HA structure which separates attributes into a separate table, I needed another table to store the attributes too. In case it helps someone, I’ve copied the .SQL script that I ended up with from my HA repo:

-- Copied from https://community.home-assistant.io/t/home-assistant-db-vs-mariadb/224295/4  
-- Edited to reflect new table structure 
-- Changes: 
-- 1. Reflect new states format 
-- 2. Archive attributes too

-- This SQL file will set up the required tables. It's only needed if you make a new database from
-- scratch, in which case these commands should be run manually in the database query interface.
-- Otherwise, the functions etc persist so this is just stored for reference. 

-- *** TABLES ***

-- Analytics Table Definition  
CREATE TABLE IF NOT EXISTS public.states_archive (  
state_id integer NOT NULL, 
entity_id character varying(255) COLLATE pg_catalog."default",  
state character varying(255) COLLATE pg_catalog."default",  
attributes text COLLATE pg_catalog."default",  
event_id integer,  
last_changed timestamp with time zone,  
last_changed_ts float,
last_updated timestamp with time zone,  
last_updated_ts float,
old_state_id integer,
attributes_id integer,
context_id character varying(36) COLLATE pg_catalog."default",  
context_user_id character varying(36) COLLATE pg_catalog."default",  
context_parent_id character varying(36) COLLATE pg_catalog."default",  
origin_idx smallint
);

-- Analytics state attributes table 
CREATE TABLE IF NOT EXISTS public.state_attributes_archive 
( 
attributes_id integer NOT NULL, 
hash bigint, 
shared_attrs text COLLATE pg_catalog."default" 
); 


-- Function to insert into analytics table the the row being added to states table  
create or replace function trigger_on_states()  
returns trigger language plpgsql  
as $function$  
begin  
insert into states_archive  
select new.*;  
return new;  
end; $function$;  


-- Create useful indexes for analytics  
create index IF NOT EXISTS ix_states_archive_last_updated 
ON states_archive using btree 
(last_updated DESC);  
CREATE INDEX IF NOT EXISTS ix_state_attributes_archive_hash 
ON public.state_attributes_archive USING btree 
(hash ASC NULLS LAST); 


-- *** TRIGGERS *** 


-- Trigger that is fired on each insert to states table  
create or replace trigger trigger_on_states  
before insert  
on states  
for each row  
execute procedure trigger_on_states();  


-- Ditto triggers for attributes 
create or replace function trigger_on_attrs()  
returns trigger language plpgsql  
as $function$  
begin  
insert into state_attributes_archive  
select new.*;  
return new;  
end; $function$;  
-- Trigger that is fired on each insert to attrs table  
create or replace trigger trigger_on_attrs  
before insert  
on state_attributes  
for each row  
execute procedure trigger_on_attrs();