Sqlite DB corruption - migration to other engine?

maciey · March 19, 2022, 10:34am

I was querying Sqllite DB (using external tool - dbeaver via SMB share mounted in WIndows) for historical data. Just selects, no updates
Then noticed I’ve lost all historical entries in DB, db file (home-assistant_v2.db) was recreated - side was very small

I restored HA from backup, everything returned back to normal, DB file size is again ~300M
(I have HA only ~2months)

So I tried this again and all this happened again !

I still would like to do some hostorical entreis analysys, so my questions then are:

Is it safier to use plugins to query data (eg SQLite Web) ?
does HA support other / safier DB engines (eg Postgres/MySQL)
- if so, what you would recommend ?

vingerha · March 19, 2022, 11:15am

I have never seen this happening with me, using another sqlite tool DB Browser
I did move to mariadb though because of the larger size of db and I wanted to have a bit more control.
Plenty of mariadb related posts in this forum so just search for it

maciey · March 19, 2022, 11:28am

Do you mean sqlite doesnt manage well with larger db size, or MariaDB can reduce it?

vingerha · March 19, 2022, 11:47am

Mariadb performs better on my machine (docker setup on NUC), especially visible when browsing logbook/history

maciey · March 19, 2022, 11:52am

OK thanks - I have similar setup (NUC). I willl check MariaDb then.
I cant see any issues with performance yet, but I hope this will protect me against potential data loss as well

I spent so much time configuring, collecting data by different entities, so it will hurt sooo much if this happen again

AaronCake · March 19, 2022, 2:35pm

Definitely switch to MariaDB / MySQL / Postgres / etc. By nature, SQLite is a single client DB. If you open the .DB file, that file is then locked and any other process fails to write to it. Hence HA created a new DB.

However be very careful about using the HA DB to store long periods of data. The database structure is extremely inefficient, about as inefficient as it can possibly be, and will bloat to large sizes with surprisingly small amounts of data. This will have the knock-on effect of slowing everything in HA down. Especially graphs and history.

If you really want to store long term data you need to use a secondary database such as Influx.

maciey · March 20, 2022, 7:21am

Thank you for explaining this better
what do you mean by secondary database?

Should I leave sqlite as primary and add influx as secondary?
or migrate to Maria and add to that influx

AaronCake · March 20, 2022, 2:24pm

InfluxDB is its own integration that runs alongside the recorder. You keep your primary recorder database (SQLLite, MariahDB/MySQL, etc) and set a low purge interval. Entities you specify are recorded both to Influx and the recorder, but not purged from Influx when the recorder purge runs.

maciey · March 20, 2022, 2:39pm

Thanks - Im happy to try this community integration later today

What I still dont understand is:

What will happen if I’ll purge SQLite DB - will all interfaces showing historical sensors values show data automatically from Influx DB (eg Energy Dashboard) ? Or Id need to do some config to re-point all to Influx
Having InfluxDb, is there any sense to mnigrate out of SQLITE to eg MariaDb ?

AaronCake · March 20, 2022, 2:46pm

[quote=“maciey, post:9, topic:403629”]

What will happen if I’ll purge SQLite DB - will all interfaces showing historical sensors values show data automatically from Influx DB (eg Energy Dashboard) ? Or Id need to do some config to re-point all to Influx[/quote]

You lose that history because recorder/history/energy only pulls from the recorder database. However I believe that trends are stored in a different recorder DB table than states so you don’t lose the trends or energy data with a short purge interval.

MariahDB is easier to back up and manipulate as it is a multi-user DB engine. It will also be a bit more performant that SQLite with larger databases.

Unfortunately though, due to the HA database structure, the database will always be an issue. I’d suggest keeping the purge interval for the recorder database under 30 days regardless of the engine you use.

maciey · March 20, 2022, 3:33pm

So I dont really understand benefits installation of Influx DB then…

maybe apart of potentially protecting data against to what happened to me last days

But it seems the same can be archived by migration to Maria DB (MySQL fork)
I was using in past MySQL DB with millions of rows - it should handle HA entries…

dproffer · March 20, 2022, 7:18pm

I use Postgresql with SQL triggers to create long history tables. Works well.

I ran with the default SQLite Recorder on a rpi 3 for about a year and had a couple corruptions, most likely due to me trying to query the db in real time and not really known what I was doing. That and other interests led me to move HA to a Intel box with a docker setup. I converted the database engine Postgresql in docker along side HA in docker. I wanted to keep years worth of history to look at energy, enviro and other changes over time. Extending the history in the main HA Recorder database caused trying to bring up any historical query or graph within HA to be rather slow. So this lead me to look at solutions to get the long history in another database store and keep HA leaner. I first tried Influx db and it is powerful has some very good tools. However, it was another system/language for me to learn and keep up, so I moved to making a solution in Postgresql, a standard SQL system that I had fair knowledge in. I have been very happy with this running nonstop for three years.

I keep 65 days in the Recorder database, that yields HA tables with 70 million rows. And the long history tables are currently 640 million rows for 3 years of history. Too fully export everything for safety backup takes a little over an hour of off line time on a Intel i7 machine with SSD drives, I do this quarterly for a CYA. The whole database store is currently taking up about 900 MB on a 2 TB SSD. I have some other database work that accounts for about 100 MB of that.

I use Juypter, TablePlus (highly recommend) and Grafana for most of my analysis and visualization work. I have some analytic queries that can run up to an hour, I see no negative effect on HA when these run.

As I said, this solution requires creating two very small ‘copy on insert’ triggers for the states and events tables. These are pretty standard across most database platforms.

Performs very well and zero problems to date. I highly recommend Postgresql even without my mod. Postgresql’s ability to query JSON data is very useful with HA, as HA stores a lot of JSON data in its records.

The HA data is not very complex and its really not doing anything hard in terms of SQL function. Pretty much just appending records to the end of tables.

Good hunting!

maciey · March 21, 2022, 7:40am

Many thanks @dproffer for your explanation / advice - Im very keen to replicate same in my setup

Im using Postgresql at my daily work and I really love it.
From the other hand I like learning new tools / languages, so Influx is not show stopper to me.

Key assumptions / expected benefitrs to me are:

do not loose historical data (I plan keep at lest few years there)
option query data by external tools (eg SQL IDE / plugins)
HA performance
reduce risk my own (non-standard) setup will impact future HA updates
reduce DB size if possible
keep working standard HA backup / restore mechanism (Ive bricked HA few times already )

Will Postgress address all above ?

Sorry but I have also few more quesitons…

How you purge that ?

Are you just executing SQL Deletes on SQLITE statistics tables ?
If so I guess you need to also execute VACUUM on Sqlite DB which will cut off HA from DB

Can you please advise shortly how to get this setup working (with Postgres)
Is there any integration / addon similar to mentioned earlier Influx DB ?
You have written you are using DB triggers, but first data has to be inserted to new (postgres) DB in order to use tables’ triggers

And final question - re general concept of having parallel DB (Postgres / Influx etc).
Am I right thinking that standard dashboard cards will not use that DB ?
To represent that we need to use other tools which will be able to source from external data source (eg Grafana) ?

boheme61 · March 21, 2022, 9:14am

In Regards to “Energy Dashboard”, if you have “sensors” from an integration for/ or from your Grid/electricity provider, these sensors will always shows “History” Delivered from your provider, Month and years back, you will never loose that, … And No, HA does not by default shows any Data from InfluxDB

maciey · March 21, 2022, 10:01am

I have Fronius inverter integration installed. HA shows me more granular data than Fronius
In free plan they shows hourly data diagrams only 3 days back. In HA I can browse as many historical days as I want (data was collected)

When my Sqllite DB was wiped by HA I’ve seen nothing in “Energy Dashboard”

Thus my guess is data is being kept in Sqlite database, and losing that I lose history
Am I wrong ?

I guessed this is not being done by default.
I was more asking about scenario when I install this influx db integration / plugin.
If then this plugin will ‘remap’ HA to use Inlfux, or this is just datasource being kept ‘on the side’ and then I need to use some other tools (eg grafana) to point that source and read historical data

boheme61 · March 21, 2022, 11:48am

Ok, i don’t know about Fronius, but my “integration” when installed, “picked” everything from my Electricity provider, so my “Table/Dashboard” now is filled with data way back in time, before i installed the integration, so i can see Hours/Days/Weeks/Month(+ all days in month /and Year(+all month in a year)( and it’s free plan !, so i just don’t see Current Consumption ), regardless if i purge or dump.

Yes, if your sensors “pick up” data locally( and not through providers interface ) then it’s entirely local stored ( and lost ), but a backup of DB would get it back ( best procedure for a sqliteDB backup is shutdown HA-Copy DB , in my opinion, and i haven’t had any problems with that)

When you install InfluxDB, you will get choices (Configurations) to only "include/exclude the entities/domains/globs you want from Recorder( to keep this to it’s minimum, for the entities you want to make Graphs out of)(if you don’t configure it, it will “include” ALL “usable” entities) … in Recorder you can make configuration to “Exclude/Include” entities from being Recorded at all ( to also reduce size this of homeassistant_v2_db this way) … i have retention/purge to 30 days, but as Recorder now records ALL(if not specified) and Longterm_Statistics was “included” in HA in August last year, some have/had(and still have), problems keeping “control” over their DB and at an “acceptable” Size, somehow they have missed to read/understand the “breaking” changes in Major updates. ( breaking changes that have caused more n more people over to use an additional DB, as i.e. MariaDB(internal or external)
InfluxDB + Grafana do what it’s best for … awesome Graphs, and a minimum of usage, SQLite do what it’s best for (In App controlled DB) , MariaDB adds another “dimension” of flexibility.

Correct

Correct … PS: thou you can “decide” to make an “additional db” in Influx ( before connecting source )

maciey · March 21, 2022, 1:46pm

Thank you @boheme61 giving better light on that!

I didnt know about ‘recorder’ existence
I think Ive found interesting topic re this.

I’ll give a try later today Grafana with this InnoDB.
Will see if this can replace standard stuff Im using for graph (im most interested energy dashboard thought)

Just (very) last question re backup:
Im using this plugin to automate backups and copy to gDrive.
I guess backup / restore process implemented there will not cover my InfluxDB db,and I have to care myself for that. Is that correct ?

boheme61 · March 21, 2022, 2:09pm

I don’t know about google-drive-backup, i have Asus-RemoteDrive/webstorage , but nothing i use for HA, in HA i’ve only, so far, used manual backup of DB-file, and manual of whole Image … And tried to create backup/restore from backup, and Yes they are very “limited” , but with one of resent “Updates” , i did restore to old, install update again, restored from older, reversed to latest( so fort and back and forth) … didn’t break anything
I briefly went through your link(Influx) , seems he said “same” back-up as HA, but the “link” seems like a good “source” to get a stable environment, as well as your first link (Recorder)

About google-drive-backup there also seems to be “different opinions” in this forum, but i actually think it all depends on people, needs, “setup” , experience and “old” habits

PS: so far i rely on HA’s system backups ( thou not for entire DB, as it’s not included ) , and all my “views/cards/yaml etc” i also (by routine) manual save a copy, on another disk … i am still “building” so im sitting daily with the keyboard in hand

PS2: i did once “corrupted” the homeassistant-dB( i think, didn’t bother to find out, just replaced it ) … I opened the DB while system was running in DB-Brower on Windows , thou this never happened when using SQLite Web add-on in HA, guess the Core handles priority there

maciey · March 21, 2022, 2:57pm

Thanks for patience and advises. I think you have me good direction I should follow
I’ll play today once Im back from work

I think you are right. This happened to me using external SQL IDE in Windows and accessing directly file via smb share - so it might be kind of lock conflict or something
(Lesson learnt!)

dproffer · March 21, 2022, 7:25pm

Comments on your points and questions.

Good that your ‘daily driver’ is Postgresql, and you have the time and interest to explore Influx also. I just find it hard to pave a whole new street when I can expand my lane. Especially when the coin we are working for at home is usually the significant other acceptance factor

You might have a look at this time series database layer, Timescale, on top of Postgresql. Search this forum to see folks experience with using it with Home Assistant and have a read of the blog post cited at end of post.

I would say that for me, Postgresql and my history setup meets the key assumptions and benefits you cite.

The 65 day retention period for HA’s Recorder is managed by HA’s history setting. I do nothing, HA keeps its database size. I’ve never had to Postgresql VACUUM any of my databases, the automated Postgresql clean up functions have done all the work for 3 years.

I do not use SQLite for parts of this system, so I’m not sure what you are asking about vacuuming SQLite DB.

I have posted a basic set of instructions to create the needed table, trigger and function for each HA table that you want to create this for. Hope it is readable, if not give a shout. I’m sure there are more optimized ways to accomplish this, however what I coded seems to perform without issue. My HA instance is adding about 100,000 records in total to it’s events and states tables per hour. The trigger copy functions keep up this just fine on my Intel i7 machine where both HA and Postgresql run, along with about 20 other docker containers.

I do not use any HA integrations for this system, I just run Postgresql in a docker container and point the db url at it in my HA configuration.yaml , and then do these steps cited to create the trigger copy functions.

I’m not clear on your statement ‘but first data has to be inserted to new…’. I am pretty sure I just create an empty table with columns and indexes I want. No records in the table. Then I add my functions and triggers for the two tables, events and states. And off it goes. You could do processing in the function that is called on each insert trigger, but it seems better to keep this path lean. Just get the records copied over, then mess around.

As far as feeding data back to HA from this system, I use the Home Assistant SQL Sensor, AppDameon and Grafana. However, the bulk of my use so far has been outside HA to do analytics via Jupyter notebooks, python scripts in cron jobs.

I have not yet explored the newer summary/statistics tables that HA is adding. I get my instantaneous energy usage via a zigbee smartmeter and various energy monitoring wall sockets and plugs. And again, even though this data is in HA, so far I just analyze it using Jupyter outside HA. I do bring back one year and two year ago data to HA for some presentation in HA’s Lovelace.

Good luck, hope you find a solution that works as well for you as this has been for me.

postgresql home assistant archive trigger setup
202007191400

created tables, procedures and triggers to save records from HA states and events tables to archive tables

psql -h 192.168.1.100 -d homeassistant -U hau -W

--------------------
-- states Analytics Table Definition, create the archive table
CREATE TABLE "public"."states_archive" (
    "state_id" int4,
    "domain" varchar(64),
    "entity_id" varchar(255),
    "state" varchar(255),
    "attributes" text,
    "event_id" int4,
    "last_changed" timestamptz,
    "last_updated" timestamptz,
    "created" timestamptz,
    "old_state_id" int4
);

--------------------
-- Function to insert into analytics table the the row being added to states table
create or replace function trigger_on_states()
returns trigger language plpgsql
as $function$
begin

    insert into states_archive
    select new.*;
    return new;
end; $function$;

--------------------
-- Trigger that is fired on each insert to states table
create trigger trigger_on_states
  before insert
  on states
  for each row
  execute procedure trigger_on_states();

--------------------
-- events Analytics Table Definition, create the archive table
CREATE TABLE "public"."events_archive" (
    "event_id" int4,
    "event_type" varchar(32),
    "event_data" text,
    "origin" varchar(32),
    "time_fired" timestamptz,
    "created" timestamptz,
    "context_id" varchar(36),
    "context_user_id" varchar(36),
    "context_parent_id" varchar(36)
);

--------------------
-- Function to insert into analytics table the the row being added to events table
create or replace function trigger_on_events()
returns trigger language plpgsql
as $function$
begin

    insert into events_archive
    select new.*;
    return new;
end; $function$;

--------------------
-- Trigger that is fired on each insert to events table
create trigger trigger_on_events
  before insert
  on events
  for each row
  execute procedure trigger_on_events();

--------------------
-- create a single index in archive tables on the 'last_updated' column, seems the most useful
create index ix_states_archive_last_updated on states_archive using btree (last_updated DESC);
-- took maybe 5 minutes
create index ix_events_archive_time_fired on events_archive using btree (time_fired DESC);
-- took under 10 minutes to create

--------------------
-- create nice view of states archive
DROP VIEW IF EXISTS states_archive_01;
CREATE VIEW states_archive_01 AS
SELECT
    last_updated,
    entity_id,
    state,
    state_id,
    attributes
FROM
    states_archive
ORDER BY
    last_updated DESC;


--------------------
-- To stop the triggers and copying of records to the archive tables, just delete the two triggers
drop trigger if exists trigger_on_states on states;
drop trigger if exists trigger_on_events on events;
-- To restart the copying of the records to the archive tables, just recreate the triggers
-- see above