What’s the best database for HA with a lot of entities?

Hi,

Is there any guidance on what s the best database to use for Home Assistant?

My install is HUGE and been running for close on 8 years now… I moved to Maria years ago, but of late, all the glitches are now becoming more than an inconvenience.

With the latest update, my HA simply keeps restarting with nothing in the logs…

I was under the impression that something like Postgres would be the “Most powerful” but after reading this, it looks like SQLlite is now the flavour of the month?

I am not a DB guy, but lets assume:

  1. Storage is irrelevant
  2. Processor/RAM is irrelevant
  3. I have close on 5000 entities in my HA

What DB should I be using? I know this may draw a lot of “Well it depends…” responses, but my question is simple…

What is the best performing DB setup for HA… Timescale even or back to SQL Lite?

Not really. For someone to be able to respond, he/she should have tried a number of different databases and run benchmarks on all of them. That’s very few, if any.

Then again, those benchmarks would be for a specific use case (number of entities, retention of detailed states, …)

I rolled back to sqlite just because it was faster for my use case, which is

  • long-term statistics for graphs
  • don’t care about details states

I’d respond to that to open your own thread rather than hijacking one that your question is unrelated to.
I actually wanted to respond with fewer words, but those words would be frowned upon by moderators.

You mean is your follow-up request for a benchmark related to “So I was wondering if anyone has ever successfully migrated from MySQL/MariaDB back to sqlite ?”? It sure is related, but doesn’t answer this question, does it?
So you are (not) answering his question by an unrelated question. That’s a definition of hijacking a thread.

I see that you run a HA-related business, so your question surely makes a lot of sense in that regard.
Being rude is not the best approach to get (free) help, though, I’m afraid. You’ll definitely not get any from me in the future, for sure.

If you had actually read the thread, you might have noticed that I did.
But you don’t actually care about the OP question, do you?

Well, for a start, it’s not really the topic of this thread either isn’t it?
So before being rude to others, why don’t you only post here if it’s directly related to the topic on how to migrate back to sqlite.

It’s not a discussion, it’s a question to those that do know.

  1. When you @jyavenard initially discovered these initial performance issues, did you look at any of the other databases or did you just look at Maria?

  2. When discussing databases and troubleshooting the Maria DB on another server… Did you automatically decide to go back to SQLLite or did you consider any other databases?

  3. What was the key driver that made you decide to move back to SQL lite as opposed to trying out some of the other options?

I think most people decided to move back to SQLITE after watching the HA developers video of 2023.4. There it was mentioned about lots of database improvements on SQLITE. You can read lots of other posts about this, on the 2023.4 thread.

And yeah, this thread is about moving back from MariaDB to SQLITE, that’s all.

my SD card was getting full with huge database (over 10GB)
Was running super slow, every automations took seconds to start. Which is annoying when you want to turn on a light

I had an existing MySQL server with plenty of databases already running.
The HA box was in the same cabinet as my MySQL server. I didn’t think twice

My decision to revert to SQLite are two folds:

  • it’s a pain to backup whenever I upgrade HA, keeping in sync the HA backup linked to the MySQL (now MariaDB) dump
  • A HA db upgrade broke everything as the original SQLite to MySQL switch had introduced broken fields. I also like the idea of having an entirely self contained HA better.

See above

1 Like

I also think that the whole-HA-as-one-package is a great benefit.

If you want to avoid having too much data in HA, you can set the time range recorder keeps quite short and log important long-term statistics to somewhere else.

Here you‘re free in your choice and the data format, as the db isn’t essential for HA to run…

I installed MariaDB add-on which install MariaDB software on the same server as the Home Assistant (Proxmox VM). So, it’s still “one package” and “contained”, I guess.

Hi, This is how I was running for a long time… like 5 years or something, but of late, my whole system has become very very slow to the point of being unusable… with all the talk about sql lite “back in fashion” I moved over to that, but ultimately the question still remains in terms of what the best performing database is for a large install. from all the docs it looks like Timescale is really good high performing, but before I wasted any more time messing around with DB, I wanted to post the question in a hope that someone who knew anything about databases could advise…

as a side, I also use influx on a separate NAS to store long term data and my retention on HA DB is set to like 10 days or something

I don’t think this is accurate as different db options in my install have varying degrees of diabolical behavior manifested in the UI…

An external DB that is not used for recorder, only to push selected values to, shouldn’t be a problem. Otherwise there’s a bug in the integration or you’re pushing your hardware too hard.

The „expansive“ operations are normally not inserts to the db. Complex selects in a large DB on slow machines are a different story, but HA won’t read on its own from this DB, when it’s not used for recorder.

When it was announced that MariaDB need to be upgraded to newr than 10.3 (or so), I made update and some tests at the same time. In my config I use DB in docker on Synology NAS, but since I run into wear problems with SSD on which old instance was running, I decided to try also putting DB on traditional HDD (WD RED). The difference in performance was night and day. Loading 1 day history for ~50 entities into panel from SSD is ~3s, from HDD it was ~20s. Loading 1 month of data from SSD is ~30s, on HDD it was ~5m. All this with gigabit connection between HA VM and NAS.
My install is definitely smaller (~2400 entities), yet it is rock stable and I do not perceive any performance issues in any aspect. So perhaps the problem is not the type of database or how it grow, but media on which DB is kept and your 1st assumption point is wrong?

1 Like

Something else is amiss here. I run it off HDD (RAID Z2), and I’m yet to have any queries take more than a couple of seconds

Well, database performance is very much dependent of IOPS that underlying storage can provide. I use WD RED drives for both SSD and HDD. SSDs provides 31000 IOPS while HDDs 150 for data reading. Also latency is quite different 0.048ms vs 3ms respectively. Throuput is not relevant here, as finally amount of data to be retrieved from DB is relatively small (and anyhow if it would be blocking factor then it would be for SSD not for HDD).
Here I’d rather blame specific use case I presented; 50 entities with very large amount of datapoints: 1 month timeframe, + some more data preloading done by history explorer card to facilitate scrolling + some of sensors send lots of data (for sensor reporting every minute there is 44.6k of datapoints to be retrieved from DB!). So how fast is underlying storage makes huge difference!

1 Like

I am running Kingston DC 500 R’s on Proxmox in ZFS… I have also 12 CPU’s and 22GB RAM … there is also a 4Gig connection to the box So I doubt my issue is related to hardware

In the past, I have usually had better experiences with PostgresDB than with MySQL/MariaDB in terms of read and write performance. Regardless of whether with HDDs or SSDs.
However, I do not have any measured values. These are rather personal empirical values. As @mirekmal already said, it depends very much on your personal requirements.

.::edit::.
The benchmarks on this page also roughly reflect my experience:
ClickBench — a Benchmark For Analytical DBMS (clickhouse.com)

For example, PostgresDB here is not only used by Homeassistant, but also by my document management with paperless-ngx, as well as by various small python and web development experiments.
I also usually do not connect the database via TCP but directly via the Unix sockets, which can lead to better performance.

Currently I’m mostly using TimescaleDB (a time series optimised version of Postgres): Time-series data simplified | Timescale.
I also tried QuestDB for a while, which had great performance. But unfortunately, at the time of my last tests, it was still missing some analytical functions (e.g. simple Window functions) that are still indispensable for me. QuestDB also tended to use less RAM than, for example, Postgres, but much more CPU resources, which in turn leads to higher energy consumption and thus to more costs when the database is in continuous use.

To be really sure, you might just want to go out on a limb and do some measurements and experiments yourself to find out which DB engine suits you best.

2 Likes

I think (hope) I found my problem…

A long time ago, I had created an SQL sensor. I then removed this from my yaml.

It was not showing up in any log errors for some or other reason, but the sensor was still listed in my core. I went in and manually removed this after an upgrade to Timescale gave a few more details and highlighted this error in my logs

From what I can now see, I can confirm that with Postgres, my performance does feel a little better…

1 Like

SQL sensors were migrated to the UI along time ago. That means you were ignoring the “remove the yaml, this was added in the UI” messages in your logs for months if not years.