How to keep your recorder database size under control

In a word, no. For attribute-heavy entities (and there are a lot of them) the best we can do is exclude the entity, then create a template to hold only the value you’d like to keep.

Be careful when defining your templates, too. Giving them things like unit_of_measurement, friendly_name or icon adds those as attributes. So you can end up with the same problem.

Remember, these attributes are stored as JSON strings with each variable name and value spelled out in ASCII text. So while you think you’re only storing a binary or numeric value, you’re actually storing a long, multi-value string with every state change.

Anyway, here’s the SQL I use to see the heavy hitters in the state_attributes table. I can’t claim authorship:

SELECT
  COUNT(state_id) AS cnt,
  COUNT(state_id) * 100 / (
    SELECT
      COUNT(state_id)
    FROM
      states
  ) AS cnt_pct,
  SUM(
    LENGTH(state_attributes.shared_attrs)
  ) AS bytes,
  SUM(
    LENGTH(state_attributes.shared_attrs)
  ) * 100 / (
    SELECT
      SUM(
        LENGTH(state_attributes.shared_attrs)
      )
    FROM
      states
      JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id
  ) AS bytes_pct,
  states_meta.entity_id
FROM
  states
LEFT JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id
LEFT JOIN states_meta ON states.metadata_id = states_meta.metadata_id
GROUP BY
  states.metadata_id, states_meta.entity_id
ORDER BY
  cnt DESC;
2 Likes

No prob. As I said, I’m more than happy now. Just checked your query but I prefer mine tbh, as it’s in MB and sorted for size, not count.

Also noticed my db went down once again, this time down to 80MB. Not even sure why this time, but it must be one of the previous changes hadn’t made an effect yet. 55GB to 80MB. Not even in my wildest dreams.

So my database is suddenly 10GB. Trying to get some stats from the dbstat seems impossible. Nothing happens when I try to pull data from it via SQLite Web. I get no errors. Is this a timeout? If so, what can I do? Can I purge some data to get head over water again?

Best option: Stop HA, copy home-assistant_v2.db to another machine, re-start HA. Run DB Browser for SQLite (or any similar tool) on that other machine. After figuring out what you need to purge, do so from Developer Tools / Services in the HA UI. Be sure to exclude the offending entities so it doesn’t happen again.

Nuclear option: Stop HA, delete the home-assistant_v2.db file. HA will create a new one when it restarts. You lose all history and statistics data prior to the delete, but you have a clean DB.

2 Likes

Thanks a lot! Appreciate it! I went nuclear (didn’t even stop HA. Just moved the database file in case of emergency, and restarted) :sweat_smile: - and it solved all my problems. Database is restarted, and I’m able to get the stats I need from “dbstat” to do some filtering.

EDIT: Also found what’s causing the database reaching 10GB. (Thanks to OP!) Must have done something wrong when reconfiguring a raspberry pis system sensors python script which was reporting every second over MQTT all system stats.

1 Like

So I’ve discovered my database is 15 Gb and growing but when I’m trying to run the queries above to determine what entities/sensors are taking up the most room they keep timing out, and I’m assuming that is because the database is too big. Any way to increase the timeout limit on queries so that I can figure out what is actually going on here?

1 Like

Had the same issue. I ended up just deleting the database. What I didn’t try was limiting the query. Try this:

– Updated query Dec 2023
SELECT
COUNT() AS cnt,
COUNT(
) * 100 / (SELECT COUNT(*) FROM states) AS cnt_pct,
states_meta.entity_id
FROM states
INNER JOIN states_meta ON states.metadata_id=states_meta.metadata_id
GROUP BY states_meta.entity_id
ORDER BY cnt DESC
LIMIT 15


My database is 1.9GB
When I use this query

SELECT 
  COUNT(state_id) AS cnt, 
  COUNT(state_id) * 100 / (
    SELECT 
      COUNT(state_id) 
    FROM 
      states
  ) AS cnt_pct, 
  SUM(
    LENGTH(state_attributes.shared_attrs)
  ) AS bytes, 
  SUM(
    LENGTH(state_attributes.shared_attrs)
  ) * 100 / (
    SELECT 
      SUM(
        LENGTH(state_attributes.shared_attrs)
      ) 
    FROM 
      states
  ) AS bytes_pct, 
  states_meta.entity_id 
FROM 
  states 
LEFT JOIN state_attributes ON (states.attributes_id=state_attributes.attributes_id)
LEFT JOIN states_meta ON (states.metadata_id=states_meta.metadata_id)
GROUP BY 
  states.metadata_id 
ORDER BY 
  cnt DESC

And then make the sum of the bytes column in a spreadsheet I have 696.367MB

How can I find out where all other disk space us used?

1 Like

Presumably, in some of the other tables besides state_attributes and states_meta, where your query is looking. I’d look at the statistics table next:

SELECT
  statistics_meta.statistic_id,
  count(*) cnt
FROM
  statistics
  LEFT JOIN statistics_meta ON (
    statistics.metadata_id = statistics_meta.id
  )
GROUP BY
  statistics_meta.statistic_id
ORDER BY
  cnt DESC;

Admittedly this doesn’t give the space utilization, but it should give you an idea of the scale of the problem. I have no use for these “statistics” tables and have been know to simply purge them. I wish there were an option to not populate them in the first place.

I don’t think the statistics table is the problem.
When I execute that query it shows me that I have 46 results.
The highest count is 22888, but indeed I don’t know the space utilization.
But I don’t know where to search further.
Maybe someone can adapt the statistics query so it shows also the space utilization?

When you add this to your customize.yaml

sensor.xxxxxx:
  state_class: none

You can delete them in the statistics dashboard

2 Likes

To get a proper offline copy of your HA database, the following procedure can be used:

Make a full backup of your HA system using Settings - System - Backups - CREATE BACKUP (select full). When finished, click on the created backup and select the three dots - Download backup
In Save as-dialog, select some place to do the analysis.

On Windows, with 7z installed, ‘Extract files…’, creates a directory with several .tar.gz-files
The one of interest is homeassistant.tar.gz.
‘Extract Here’ produces homeassistant.tar and then ‘Extract files…’ creates a directory homeassistant where the subdirectory data has (among others) home-assistant_v2.db, which is the SQLite database.

The analyzer sqlite3_analyzer.exe is available for download from SQLite Download Page
and documented at The sqlite3_analyzer.exe Utility Program

Running sqlite3_analyzer creates a detailed study of the database
sqlite3_analyzer home-assistant_v2.db > analyzis

1 Like

If I try the SQL’s to get the heavy hitters states I often get a error in phpMyAdmin. Possibel timeout because the SQl is running to long on my system.

Error:
Error in Processing Request
Error Text: error (rejected)
It seems that the connection to the server has been lost. Please check your network connectivity and server status.

Everybody know how I could expand the runntime to get a result for the SQL:

SELECT
  COUNT(state_id) AS cnt,
  COUNT(state_id) * 100 / (
    SELECT
      COUNT(state_id)
    FROM
      states
  ) AS cnt_pct,
  SUM(
    LENGTH(state_attributes.shared_attrs)
  ) AS bytes,
  SUM(
    LENGTH(state_attributes.shared_attrs)
  ) * 100 / (
    SELECT
      SUM(
        LENGTH(state_attributes.shared_attrs)
      )
    FROM
      states
      JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id
  ) AS bytes_pct,
  states_meta.entity_id
FROM
  states
LEFT JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id
LEFT JOIN states_meta ON states.metadata_id = states_meta.metadata_id
GROUP BY
  states.metadata_id, states_meta.entity_id
ORDER BY
  cnt DESC;

Thanks

This is the sensor I have created that gives the size of my MariaDB

# Sensor part
# https://www.home-assistant.io/components/sensor.sql/


sql:
  - name: "Database size"
    db_url: !secret recorder_db_url
    query: >
      SELECT table_schema "database", Round(Sum(data_length + index_length) / 1024 / 1024, 1) "value" 
      FROM information_schema.tables 
      WHERE table_schema="homeassistant" 
      GROUP BY table_schema;
    column: "value"
    unit_of_measurement: "MB"

For anyone struggling with database size - I developed an addon which shows you data about database usage. You can read about it here.

By the way, it looks like there is no correct query to check shared attributes size in this topic. I was struggling with it for a long time, and this is my best effort:

select attr2entity.entity_id, sum(length(a.shared_attrs))/1024.0/1024.0 sum from (select distinct state_attributes.attributes_id, states_meta.entity_id from state_attributes, states, states_meta
where state_attributes.attributes_id=states.attributes_id and states_meta.metadata_id=states.metadata_id) attr2entity, state_attributes a where a.attributes_id=attr2entity.attributes_id group by attr2entity.entity_id order by sum desc limit 10
3 Likes

is it possible to purge from recorder by event_type?
I only see purge by entity, or entity global or domain as a Service Purge.

BTW: my top top event_type is call_service… but I don´t know if it’s safe to exclude that for recorder. What do you think ? what would I lose?

1 Like

Yeah thats a heavy query. Do you really need to be calculating percentages?
For example:
Your query:
– Result: 4631 rows returned in 627745ms

vs

SELECT 
  COUNT(state_id) AS cnt, 
   SUM(
    LENGTH(state_attributes.shared_attrs)
  ) AS bytes, 
  states_meta.entity_id 
FROM 
  states 
LEFT JOIN state_attributes ON (states.attributes_id=state_attributes.attributes_id)
LEFT JOIN states_meta ON (states.metadata_id=states_meta.metadata_id)
GROUP BY 
  states.metadata_id 
ORDER BY 
  bytes DESC

– Result: 4631 rows returned in 239724ms

2 Likes

hello.
my database in mariadb is very very big (more thant 60GB).
Now im searching for the entitis thant create more records in the entitys table and excluding the no neccesarys. and ok its slow procedure but works.

but i have a little problem with the state_attributes table.
this table have 6GB of size.
And i dont know how to purge it.
any advice???

thanks

1 Like

david, have you found a solution? i have a similar problem. thanks

hello!.
yes and not.
jejeje

I remove a lot of sensors, to decrease the database (now is 26gb).
I remove manually the table events (because the size was more than 6gb).

after this in developer options in stadistics, i find a lot of problems and when i push solve button, i push delete.

After this I used the purge service of the recorder with the 2 options active (aply filter and repack).

and i dont know what of this things make the magic, but now my state_attributes table size is 2.8GB
is not perfect. but is the 50% of previous size (6gb)