How to keep your recorder database size under control

Really works. Thank you @crazyasyou

Interesting, it didn’t work for me. I use the default SQLite database and the SQLite DB Browser. I get this when I enter the SQL from @crazyasyou’s post, above:

Execution finished with errors.
Result: ambiguous column name: entity_id
At line 1:

However, it appears to work when I specify the table name for entity_id in the GROUP BY statement:

SELECT
  COUNT(*) AS cnt,
  COUNT(*) * 100 / (SELECT COUNT(*) FROM states) AS cnt_pct,
  states_meta.entity_id
FROM states
LEFT JOIN states_meta ON (states.metadata_id=states_meta.metadata_id)
GROUP BY states_meta.entity_id
ORDER BY cnt DESC;

I haven’t really had time to look into how accurate the results are. So far the new database schema, along with my aggressive excludes in configuration.yaml, have kept my database size reasonable.

Hello, I was wondering what’s the use case for recording events. I already excluded all frequent events from the recorder, but maybe I should exclude them all? Is there an UI or an API that can query the recorded events and show them or do something about them? Or is the only way to access them a manual sql query on the backend db? Is the history tab able to show recorded events? Thanks!

1 Like

Hello, thanks a lot for these info…
I installed SQLite Web, and succeed to view the size of each table (first SQL query)…
But whe I type the second one (Viewing events usage), (and same for states usage), I only get this message :


Am I doing something wrong?

No, you’re not. When I wrote this guide, those queries were valid for that specific Home Assistant version. Over time, the database evolved, and thus the queries on my initial post aren’t working anymore. Unfortunately, I haven’t had time to revisit this post and update the queries. Please see the other messages on this topic for newer queries, contributed by other people.

I have modified the SQL queries from the original post so they work on homeassistant 2023.4.6 and later.

Modified query for states:

SELECT
  COUNT(*) AS cnt,
  COUNT(*) * 100 / (SELECT COUNT(*) FROM states) AS cnt_pct,
  states_meta.entity_id
FROM states
INNER JOIN states_meta ON states.metadata_id=states_meta.metadata_id
GROUP BY states_meta.entity_id
ORDER BY cnt DESC

Modified query for Events:

SELECT
  COUNT(*) as cnt,
  COUNT(*) * 100 / (SELECT COUNT(*) FROM events) AS cnt_pct,
  event_types.event_type
FROM events
INNER JOIN event_types ON events.event_type_id = event_types.event_type_id
GROUP BY event_types.event_type
ORDER BY cnt DESC
28 Likes

Very good! thanks

Thanks a lot, it was a great support, the only solution I knew was to delete the database.
playing with recorder & the services is very helpfull !

How to show the state_attributes in order?

Guys, need an advice.
2-3 years ago it was stated in some community guide (even may be this one) that a user should use either “include” or “exclude” method, not both of them to avoid possible glitches.
Is it still valid?

On 2020 (when I started working with HA) I used only “exclude” method.
So, by default ALL entities were “recorded”, and I specified exceptions.
It was working quite OK.

Specifying exceptions (entities to be NOT “recorded”) was a rather easy task:
– list all entities in Excel;
– mark entities for which you need a history;
– others are to be added into “exclude” sections;
– after, while expanding your HA setup, just add unneeded entities to “exclude”.

Some time ago I was advised to change to “include” method.
A reason was “then ONLY entities which you intentionally permitted will be recorded”.

Both methods have pros & contras.
Is there a “modern” best practice for this?

Not sure about this. And the system that prunes the DB needs more work. I guess the key word there is “about”

On its own, with no warning, no messages, my History DB would reach about 4GB and then the system would self-corrupt and crash daily. Based on my systems daily growth, 10 days should be around 200-300MB and yet without pruning the DB would easily get over 3GB. About three weeks ago I added an automation to do a nightly DB prune of any entry over 7 days old and my DB has been stable at 200MB since.

On my system “about 10 days of history” is more like “about 4 months”

This needs to be addressed in a future release.

That statement is based on the default value for purge_keep_days, which is 10. In the default configuration it should auto_purge every night at 04:12, and auto_repack every month. If these are good defaults or should be changed, that’s a different discussion. (And based on the amount of complaints regarding database sizes, these defaults may not be ideal.)

In fact, perhaps the best solution is to split the database into two:

  • one database optimized for live updates, used for the current state and for recent data;
  • another database optimized for archival, where data is stored (appended) but never modified (or the only modification is to down-sample old data).

But this is also a suggestion for another thread, and my comment over here is already quite off-topic. Meanwhile, as users we tweak what we can, after spending a lot of time with troubleshooting headache. Well, this topic is very popular with already 74k views (the 12th most viewed community guide so far), so this is clearly a common pain point.

The problem is not so much these default values. The problem is that every entity is added to the DB by default. There should be an easy UI controlled way to add/remove entities or devices to/from being recorded on the fly. And not all entities should be added by default.

That would be the current history…

… and that would be the current long term statistics.

The problem is that both are in the same DB. They should be split up into two separate databases.

2 Likes

I run HA in a docker container and thus cannot install addons such as the SQL viewer as described in opening post.

Is it possible to install a third party sql tool in debian and connect to the sqlite instance? What credentials are used?

Beware that it can happen that the moment you connect to sqlite with third-party app it will become corrupt for HA, so HA will create a new one and old one will be very hard if not impossible to restore. Been there, done that - i didn’t manage to restore old database afterwards.
The point is supposely in the fact that only one client can connect to sqlite at any moment, while multiple clients can be connected to mariadb (or similar). so, when you connect to sqlite HA can’t connect anymore, so creates a new database.

So, only option is to make a copy of current HA database to a safe place and connect to a copy.

4 Likes

Hmm ok, even if I shutdown HA? But yes, copying it should be easy enough I guess.

If you shutdown HA then mabye it would be ok… i didn’t test that, i just installed MariaDB. You can try if you dare… :wink:

Hehe no, dont have time or energy to fix it if it breaks. :slight_smile: Will just SCP a copy to my home pc and browse it there. :smiley:

What do you mean installed MariaDB, do you use instead of the built in sqlite file?

FYI, if anyone in the future finds this post:

I shutdown the container then from my host computer I ran:

scp [email protected]:/home/serverUser/docker/homeassistant/homeassistant_config/home-assistant_v2.db .

Of course change the path above to whatever you need. Then I used DB browser for SQLite to run queries.

Interesting, my top heaviest thing is my media_player, nvidia shield, seems to constantly spam. Then comes my power meter stuff which makes sense, guess a value is recorded like every 30 seconds or so.

1 Like

Yes, i use MariaDB. I experienced a couple of random unexplained crashes of sqlite database, i also manged to break it with login( as i said above), so i went to more reliable database. Until recently i’ve had HA inside VM on my Synology and MariaDB also on Synology, but now i’ve got a chance to buy (“somewhat” overpowered…) intel NUC pretty cheap, so now i have MariaDB addon in HA installed.
MariaDB is more powerfull, but also requires somewhat more resources.

I see! I use the HA docker image but the sqlite database file is outside docker, its an mounted volume. I havent seen any instability, it is about 100MB. But I wanted to increase recording time from 10 days to like 1 or 2 years. Not sure if I will start running into issues now.

I to run on a decent powerful NUC with 128GB NVME SSD.

If I wanted to run another db for the recorder, I guess it would make sense to run the db in a separate docker image and connect it to the HA image. Seems complicated. :smiley: