How to keep your recorder database size under control

CaptTom · April 1, 2024, 12:22pm

Best option: Stop HA, copy home-assistant_v2.db to another machine, re-start HA. Run DB Browser for SQLite (or any similar tool) on that other machine. After figuring out what you need to purge, do so from Developer Tools / Services in the HA UI. Be sure to exclude the offending entities so it doesn’t happen again.

Nuclear option: Stop HA, delete the home-assistant_v2.db file. HA will create a new one when it restarts. You lose all history and statistics data prior to the delete, but you have a clean DB.

krossykross · April 1, 2024, 12:48pm

Thanks a lot! Appreciate it! I went nuclear (didn’t even stop HA. Just moved the database file in case of emergency, and restarted) - and it solved all my problems. Database is restarted, and I’m able to get the stats I need from “dbstat” to do some filtering.

EDIT: Also found what’s causing the database reaching 10GB. (Thanks to OP!) Must have done something wrong when reconfiguring a raspberry pis system sensors python script which was reporting every second over MQTT all system stats.

peter.vk · April 2, 2024, 3:29pm

So I’ve discovered my database is 15 Gb and growing but when I’m trying to run the queries above to determine what entities/sensors are taking up the most room they keep timing out, and I’m assuming that is because the database is too big. Any way to increase the timeout limit on queries so that I can figure out what is actually going on here?

krossykross · April 5, 2024, 2:06pm

Had the same issue. I ended up just deleting the database. What I didn’t try was limiting the query. Try this:

– Updated query Dec 2023
SELECT
COUNT() AS cnt,
COUNT() * 100 / (SELECT COUNT(*) FROM states) AS cnt_pct,
states_meta.entity_id
FROM states
INNER JOIN states_meta ON states.metadata_id=states_meta.metadata_id
GROUP BY states_meta.entity_id
ORDER BY cnt DESC
LIMIT 15

gieljnssns · April 8, 2024, 3:01pm

My database is 1.9GB
When I use this query

SELECT 
  COUNT(state_id) AS cnt, 
  COUNT(state_id) * 100 / (
    SELECT 
      COUNT(state_id) 
    FROM 
      states
  ) AS cnt_pct, 
  SUM(
    LENGTH(state_attributes.shared_attrs)
  ) AS bytes, 
  SUM(
    LENGTH(state_attributes.shared_attrs)
  ) * 100 / (
    SELECT 
      SUM(
        LENGTH(state_attributes.shared_attrs)
      ) 
    FROM 
      states
  ) AS bytes_pct, 
  states_meta.entity_id 
FROM 
  states 
LEFT JOIN state_attributes ON (states.attributes_id=state_attributes.attributes_id)
LEFT JOIN states_meta ON (states.metadata_id=states_meta.metadata_id)
GROUP BY 
  states.metadata_id 
ORDER BY 
  cnt DESC

And then make the sum of the bytes column in a spreadsheet I have 696.367MB

How can I find out where all other disk space us used?

CaptTom · April 8, 2024, 3:57pm

Presumably, in some of the other tables besides state_attributes and states_meta, where your query is looking. I’d look at the statistics table next:

SELECT
  statistics_meta.statistic_id,
  count(*) cnt
FROM
  statistics
  LEFT JOIN statistics_meta ON (
    statistics.metadata_id = statistics_meta.id
  )
GROUP BY
  statistics_meta.statistic_id
ORDER BY
  cnt DESC;

Admittedly this doesn’t give the space utilization, but it should give you an idea of the scale of the problem. I have no use for these “statistics” tables and have been know to simply purge them. I wish there were an option to not populate them in the first place.

gieljnssns · April 8, 2024, 4:15pm

I don’t think the statistics table is the problem.
When I execute that query it shows me that I have 46 results.
The highest count is 22888, but indeed I don’t know the space utilization.
But I don’t know where to search further.
Maybe someone can adapt the statistics query so it shows also the space utilization?

gieljnssns · April 8, 2024, 4:22pm

When you add this to your customize.yaml

sensor.xxxxxx:
  state_class: none

You can delete them in the statistics dashboard

PBudmark · April 15, 2024, 9:11pm

To get a proper offline copy of your HA database, the following procedure can be used:

Make a full backup of your HA system using Settings - System - Backups - CREATE BACKUP (select full). When finished, click on the created backup and select the three dots - Download backup
In Save as-dialog, select some place to do the analysis.

On Windows, with 7z installed, ‘Extract files…’, creates a directory with several .tar.gz-files
The one of interest is homeassistant.tar.gz.
‘Extract Here’ produces homeassistant.tar and then ‘Extract files…’ creates a directory homeassistant where the subdirectory data has (among others) home-assistant_v2.db, which is the SQLite database.

The analyzer sqlite3_analyzer.exe is available for download from SQLite Download Page
and documented at The sqlite3_analyzer.exe Utility Program

Running sqlite3_analyzer creates a detailed study of the database
sqlite3_analyzer home-assistant_v2.db > analyzis

SteffenDE · April 29, 2024, 1:50pm

If I try the SQL’s to get the heavy hitters states I often get a error in phpMyAdmin. Possibel timeout because the SQl is running to long on my system.

Error:
Error in Processing Request
Error Text: error (rejected)
It seems that the connection to the server has been lost. Please check your network connectivity and server status.

Everybody know how I could expand the runntime to get a result for the SQL:

SELECT
  COUNT(state_id) AS cnt,
  COUNT(state_id) * 100 / (
    SELECT
      COUNT(state_id)
    FROM
      states
  ) AS cnt_pct,
  SUM(
    LENGTH(state_attributes.shared_attrs)
  ) AS bytes,
  SUM(
    LENGTH(state_attributes.shared_attrs)
  ) * 100 / (
    SELECT
      SUM(
        LENGTH(state_attributes.shared_attrs)
      )
    FROM
      states
      JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id
  ) AS bytes_pct,
  states_meta.entity_id
FROM
  states
LEFT JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id
LEFT JOIN states_meta ON states.metadata_id = states_meta.metadata_id
GROUP BY
  states.metadata_id, states_meta.entity_id
ORDER BY
  cnt DESC;

Thanks

johndoyle · May 15, 2024, 12:34pm

This is the sensor I have created that gives the size of my MariaDB

# Sensor part
# https://www.home-assistant.io/components/sensor.sql/


sql:
  - name: "Database size"
    db_url: !secret recorder_db_url
    query: >
      SELECT table_schema "database", Round(Sum(data_length + index_length) / 1024 / 1024, 1) "value" 
      FROM information_schema.tables 
      WHERE table_schema="homeassistant" 
      GROUP BY table_schema;
    column: "value"
    unit_of_measurement: "MB"

jehy · May 17, 2024, 7:32am

For anyone struggling with database size - I developed an addon which shows you data about database usage. You can read about it here.

By the way, it looks like there is no correct query to check shared attributes size in this topic. I was struggling with it for a long time, and this is my best effort:

select attr2entity.entity_id, sum(length(a.shared_attrs))/1024.0/1024.0 sum from (select distinct state_attributes.attributes_id, states_meta.entity_id from state_attributes, states, states_meta
where state_attributes.attributes_id=states.attributes_id and states_meta.metadata_id=states.metadata_id) attr2entity, state_attributes a where a.attributes_id=attr2entity.attributes_id group by attr2entity.entity_id order by sum desc limit 10

hughc · June 8, 2024, 4:33pm

is it possible to purge from recorder by event_type?
I only see purge by entity, or entity global or domain as a Service Purge.

BTW: my top top event_type is call_service… but I don´t know if it’s safe to exclude that for recorder. What do you think ? what would I lose?

AJolly · July 11, 2024, 3:38am

Yeah thats a heavy query. Do you really need to be calculating percentages?
For example:
Your query:
– Result: 4631 rows returned in 627745ms

vs

SELECT 
  COUNT(state_id) AS cnt, 
   SUM(
    LENGTH(state_attributes.shared_attrs)
  ) AS bytes, 
  states_meta.entity_id 
FROM 
  states 
LEFT JOIN state_attributes ON (states.attributes_id=state_attributes.attributes_id)
LEFT JOIN states_meta ON (states.metadata_id=states_meta.metadata_id)
GROUP BY 
  states.metadata_id 
ORDER BY 
  bytes DESC

– Result: 4631 rows returned in 239724ms

DAVIZINHO · August 12, 2024, 11:33am

hello.
my database in mariadb is very very big (more thant 60GB).
Now im searching for the entitis thant create more records in the entitys table and excluding the no neccesarys. and ok its slow procedure but works.

but i have a little problem with the state_attributes table.
this table have 6GB of size.
And i dont know how to purge it.
any advice???

thanks

alonjr · August 27, 2024, 8:36am

david, have you found a solution? i have a similar problem. thanks

DAVIZINHO · August 27, 2024, 9:07am

hello!.
yes and not.
jejeje

I remove a lot of sensors, to decrease the database (now is 26gb).
I remove manually the table events (because the size was more than 6gb).

after this in developer options in stadistics, i find a lot of problems and when i push solve button, i push delete.

After this I used the purge service of the recorder with the 2 options active (aply filter and repack).

and i dont know what of this things make the magic, but now my state_attributes table size is 2.8GB
is not perfect. but is the 50% of previous size (6gb)

CaptTom · September 8, 2024, 8:52pm

Way back in February of '22 I made a few posts here about the way the statistics table was bloating my database. Now, I know some of you use statistics. If so this isn’t for you. But if you aren’t really interested in keeping those data forever, and you care about keeping your database lean, I have an update.

Some time ago, the statistics table schema was updated to stop using the start field, formatted as a date field, and use the start_ts field, defined as a timestamp format.

So, for example, if you wanted to purge the statistics table, keeping just the past 4 days of data, a new query was required:

DELETE FROM statistics WHERE start_ts < (CAST(strftime('%s', 'now', '-4 days') AS FLOAT));

It took me a while to get around to updating this query, and I found that when I ran it this time, I deleted 183,245 rows. My database shrank from 19,092KB to 3,544KB. That’s a savings of about 81%. Wow. I’m glad I keep my long-term statistics elsewhere.

Just posting here in case anyone else wants to try this.

Ildar_Gabdullin · September 8, 2024, 9:58pm

I am not a DB expert.
What I am thinking is:

If “statistics” means “long-term statistics” (LTS) - then these data are supposed to be kept forever. There is also so called “short-term statistics” (5 minutes data) - they are purged like other data after a purge interval.
If some entity is excluded from Recorder but still has LTS in DB - it is possible to delete this old LTS data (old = because the entity is excluded from Recorder, thus LTS are not stored since the moment when it was excluded) from Dev tools - Statistics (this is not correct, see below). Similar - about LTS for removed entities.
It was mentioned several times that LTS do not occupy plenty of space in DB. For me - the only reason to delete LTS from DB is a practical reason “I simply do not need LTS for some entity” (not to mention cases in pt. 2 above). So, if I do not need LTS for some entity - I do not set a corr. state_class for this entity (for template sensors), or I set “state_class: none” via “customize” (for sensors provided by other integrations). As a result - DB contains only needed LTS for particular entities.

I think the SQL-query posted above is very useful:

For educational purpose for non-experts like me.
For a possible practical case: assume you have some sensor wrongly configured, and there is wrong LTS for this sensor in DB; now you re-configured this sensor - and then purged old wrong LTS. But for this case the script should be re-written to delete LTS for a particular entity.

CaptTom · September 9, 2024, 12:38am

“Supposed” by who? I never asked for these data. I got by fine for a couple of years before the statistics tables were added to the DB. I haven’t found any way to inhibit collecting the data, short of excluding the entities from recording all data, short- or long-term.

Yes, I’ve done that wherever I could. Unless I’m missing something, the “Delete” option only appears for missing entities. It would be great if we could delete all the entries there.

When I ran the DELETE query, above, I recovered 81% of the space my database was consuming. That seems like plenty to me. Again, since I don’t use these data it was 100% wasted space.