Best option: Stop HA, copy home-assistant_v2.db to another machine, re-start HA. Run DB Browser for SQLite (or any similar tool) on that other machine. After figuring out what you need to purge, do so from Developer Tools / Services in the HA UI. Be sure to exclude the offending entities so it doesn’t happen again.
Nuclear option: Stop HA, delete the home-assistant_v2.db file. HA will create a new one when it restarts. You lose all history and statistics data prior to the delete, but you have a clean DB.
Thanks a lot! Appreciate it! I went nuclear (didn’t even stop HA. Just moved the database file in case of emergency, and restarted) - and it solved all my problems. Database is restarted, and I’m able to get the stats I need from “dbstat” to do some filtering.
EDIT: Also found what’s causing the database reaching 10GB. (Thanks to OP!) Must have done something wrong when reconfiguring a raspberry pis system sensors python script which was reporting every second over MQTT all system stats.
So I’ve discovered my database is 15 Gb and growing but when I’m trying to run the queries above to determine what entities/sensors are taking up the most room they keep timing out, and I’m assuming that is because the database is too big. Any way to increase the timeout limit on queries so that I can figure out what is actually going on here?
Had the same issue. I ended up just deleting the database. What I didn’t try was limiting the query. Try this:
– Updated query Dec 2023
SELECT
COUNT() AS cnt,
COUNT() * 100 / (SELECT COUNT(*) FROM states) AS cnt_pct,
states_meta.entity_id
FROM states
INNER JOIN states_meta ON states.metadata_id=states_meta.metadata_id
GROUP BY states_meta.entity_id
ORDER BY cnt DESC LIMIT 15
SELECT
COUNT(state_id) AS cnt,
COUNT(state_id) * 100 / (
SELECT
COUNT(state_id)
FROM
states
) AS cnt_pct,
SUM(
LENGTH(state_attributes.shared_attrs)
) AS bytes,
SUM(
LENGTH(state_attributes.shared_attrs)
) * 100 / (
SELECT
SUM(
LENGTH(state_attributes.shared_attrs)
)
FROM
states
) AS bytes_pct,
states_meta.entity_id
FROM
states
LEFT JOIN state_attributes ON (states.attributes_id=state_attributes.attributes_id)
LEFT JOIN states_meta ON (states.metadata_id=states_meta.metadata_id)
GROUP BY
states.metadata_id
ORDER BY
cnt DESC
And then make the sum of the bytes column in a spreadsheet I have 696.367MB
How can I find out where all other disk space us used?
Presumably, in some of the other tables besides state_attributes and states_meta, where your query is looking. I’d look at the statistics table next:
SELECT
statistics_meta.statistic_id,
count(*) cnt
FROM
statistics
LEFT JOIN statistics_meta ON (
statistics.metadata_id = statistics_meta.id
)
GROUP BY
statistics_meta.statistic_id
ORDER BY
cnt DESC;
Admittedly this doesn’t give the space utilization, but it should give you an idea of the scale of the problem. I have no use for these “statistics” tables and have been know to simply purge them. I wish there were an option to not populate them in the first place.
I don’t think the statistics table is the problem.
When I execute that query it shows me that I have 46 results.
The highest count is 22888, but indeed I don’t know the space utilization.
But I don’t know where to search further.
Maybe someone can adapt the statistics query so it shows also the space utilization?
To get a proper offline copy of your HA database, the following procedure can be used:
Make a full backup of your HA system using Settings - System - Backups - CREATE BACKUP (select full). When finished, click on the created backup and select the three dots - Download backup
In Save as-dialog, select some place to do the analysis.
On Windows, with 7z installed, ‘Extract files…’, creates a directory with several .tar.gz-files
The one of interest is homeassistant.tar.gz.
‘Extract Here’ produces homeassistant.tar and then ‘Extract files…’ creates a directory homeassistant where the subdirectory data has (among others) home-assistant_v2.db, which is the SQLite database.
If I try the SQL’s to get the heavy hitters states I often get a error in phpMyAdmin. Possibel timeout because the SQl is running to long on my system.
Error:
Error in Processing Request
Error Text: error (rejected)
It seems that the connection to the server has been lost. Please check your network connectivity and server status.
Everybody know how I could expand the runntime to get a result for the SQL:
SELECT
COUNT(state_id) AS cnt,
COUNT(state_id) * 100 / (
SELECT
COUNT(state_id)
FROM
states
) AS cnt_pct,
SUM(
LENGTH(state_attributes.shared_attrs)
) AS bytes,
SUM(
LENGTH(state_attributes.shared_attrs)
) * 100 / (
SELECT
SUM(
LENGTH(state_attributes.shared_attrs)
)
FROM
states
JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id
) AS bytes_pct,
states_meta.entity_id
FROM
states
LEFT JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id
LEFT JOIN states_meta ON states.metadata_id = states_meta.metadata_id
GROUP BY
states.metadata_id, states_meta.entity_id
ORDER BY
cnt DESC;
For anyone struggling with database size - I developed an addon which shows you data about database usage. You can read about it here.
By the way, it looks like there is no correct query to check shared attributes size in this topic. I was struggling with it for a long time, and this is my best effort:
select attr2entity.entity_id, sum(length(a.shared_attrs))/1024.0/1024.0 sum from (select distinct state_attributes.attributes_id, states_meta.entity_id from state_attributes, states, states_meta
where state_attributes.attributes_id=states.attributes_id and states_meta.metadata_id=states.metadata_id) attr2entity, state_attributes a where a.attributes_id=attr2entity.attributes_id group by attr2entity.entity_id order by sum desc limit 10
Yeah thats a heavy query. Do you really need to be calculating percentages?
For example:
Your query:
– Result: 4631 rows returned in 627745ms
vs
SELECT
COUNT(state_id) AS cnt,
SUM(
LENGTH(state_attributes.shared_attrs)
) AS bytes,
states_meta.entity_id
FROM
states
LEFT JOIN state_attributes ON (states.attributes_id=state_attributes.attributes_id)
LEFT JOIN states_meta ON (states.metadata_id=states_meta.metadata_id)
GROUP BY
states.metadata_id
ORDER BY
bytes DESC
hello.
my database in mariadb is very very big (more thant 60GB).
Now im searching for the entitis thant create more records in the entitys table and excluding the no neccesarys. and ok its slow procedure but works.
but i have a little problem with the state_attributes table.
this table have 6GB of size.
And i dont know how to purge it.
any advice???
I remove a lot of sensors, to decrease the database (now is 26gb).
I remove manually the table events (because the size was more than 6gb).
after this in developer options in stadistics, i find a lot of problems and when i push solve button, i push delete.
After this I used the purge service of the recorder with the 2 options active (aply filter and repack).
and i dont know what of this things make the magic, but now my state_attributes table size is 2.8GB
is not perfect. but is the 50% of previous size (6gb)
Way back in February of '22 I made a few posts here about the way the statistics table was bloating my database. Now, I know some of you use statistics. If so this isn’t for you. But if you aren’t really interested in keeping those data forever, and you care about keeping your database lean, I have an update.
Some time ago, the statistics table schema was updated to stop using the start field, formatted as a date field, and use the start_ts field, defined as a timestamp format.
So, for example, if you wanted to purge the statistics table, keeping just the past 4 days of data, a new query was required:
DELETE FROM statistics WHERE start_ts < (CAST(strftime('%s', 'now', '-4 days') AS FLOAT));
It took me a while to get around to updating this query, and I found that when I ran it this time, I deleted 183,245 rows. My database shrank from 19,092KB to 3,544KB. That’s a savings of about 81%. Wow. I’m glad I keep my long-term statistics elsewhere.
Just posting here in case anyone else wants to try this.
If “statistics” means “long-term statistics” (LTS) - then these data are supposed to be kept forever. There is also so called “short-term statistics” (5 minutes data) - they are purged like other data after a purge interval.
If some entity is excluded from Recorder but still has LTS in DB - it is possible to delete this old LTS data (old = because the entity is excluded from Recorder, thus LTS are not stored since the moment when it was excluded) from Dev tools - Statistics (this is not correct, see below). Similar - about LTS for removed entities.
It was mentioned several times that LTS do not occupy plenty of space in DB. For me - the only reason to delete LTS from DB is a practical reason “I simply do not need LTS for some entity” (not to mention cases in pt. 2 above). So, if I do not need LTS for some entity - I do not set a corr. state_class for this entity (for template sensors), or I set “state_class: none” via “customize” (for sensors provided by other integrations). As a result - DB contains only needed LTS for particular entities.
I think the SQL-query posted above is very useful:
For educational purpose for non-experts like me.
For a possible practical case: assume you have some sensor wrongly configured, and there is wrong LTS for this sensor in DB; now you re-configured this sensor - and then purged old wrong LTS. But for this case the script should be re-written to delete LTS for a particular entity.
“Supposed” by who? I never asked for these data. I got by fine for a couple of years before the statistics tables were added to the DB. I haven’t found any way to inhibit collecting the data, short of excluding the entities from recording all data, short- or long-term.
Yes, I’ve done that wherever I could. Unless I’m missing something, the “Delete” option only appears for missing entities. It would be great if we could delete all the entries there.
When I ran the DELETE query, above, I recovered 81% of the space my database was consuming. That seems like plenty to me. Again, since I don’t use these data it was 100% wasted space.