I’m very glad to see the database getting some developer attention lately, and I remain hopeful that we’ll be seeing great things in the future. That said, I personally wouldn’t consider the HA database a good repository for anything “precious.” If it were up to me, I’d have put the long-term energy statistics in their own database, or maybe even export them to simple csv files. There are lots of other tools which are good at manipulating and analyzing statistics. Getting the data out of HA would be the first step.
I updated the query, because I got unexpected results from rct’s query (like percentages over 100%).
SELECT
entity_id,
total_count,
100 * total_count / SUM(total_count) OVER () AS perc_count,
total_bytes,
100 * total_bytes / SUM(total_bytes) OVER () AS perc_bytes,
(total_count * 60 * 60) / (unixepoch() - least_update) AS count_per_hour,
(total_bytes * 60 * 60) / (unixepoch() - least_update) AS bytes_per_hour,
datetime(least_update, 'unixepoch', 'localtime') AS available_since
FROM
(
SELECT
S.entity_id,
COUNT(1) AS total_count,
SUM(LENGTH(shared_attrs)) AS total_bytes,
MIN(last_updated_ts) AS least_update
FROM
state_attributes AS SA
INNER JOIN
states AS S
ON
(SA.attributes_id=S.attributes_id)
GROUP BY
S.entity_id
) AS A
ORDER BY
7 DESC
You should always have 125% of the database size in temporary space available for temp tables, schema changes, rebuilds, repairs, optimize runs, etc. Anything less and you risk running out of space or the system failing in some way.
If you are running MySQL < 10.6.9 the query can take so long that it doesn’t purge (see this MariaDB bug report https://jira.mariadb.org/browse/MDEV-25020). There has also been an optimization in 2023.2.3 to reduce the purge time.
Finally if you are running MariaDB < 10.5.17 there are other known issues with purging / performance that will likely never get fixed in the 10.5 or below series.
Thanks for the clear explanation!
I am on SQLite, btw. I’ll copy over the db to a proper pc and see if it cleans it overnight (after manually calling an purge run).
sqlite makes it a lot easier. If you get really stuck I can look through the DB and see if something is wrong with it.
contact details in Endless purge process with MariaDB 10.6.10 · Issue #87352 · home-assistant/core · GitHub
If you copy it over, be sure to shut down HA first or you’ll probably end up with a partial copy.
- shutdown ha
- make sure the wal and shm files are cleaned up to verify its not still accessing the db
# ls -l home-assistant_v2.db*
-rwxr-xr-x 1 root root 876187648 Feb 13 23:00 home-assistant_v2.db
-rwxr-xr-x 1 root root 65536 Feb 13 23:00 home-assistant_v2.db-shm
-rwxr-xr-x 1 root root 21980232 Feb 13 23:00 home-assistant_v2.db-wal
- Copy the db to your local
- vacuum it to recover space.
SQLite version 3.38.5 2022-05-06 15:25:27
Enter ".help" for usage hints.
sqlite> vacuum;
- Copy it back to your instance
- start HA
That helped, thanks bdraco!
it trimmed 1/3 of the size. I’m at 44GB now.
I think it won’t be enough, I have 30GB free now so I’ll need to find a more aggressive way. Is there anything I can do to signal to the DB to mark more as trashed, and then repeat the trick of vacuum in a 2nd computer?
If you are looking for a quick fix, you could stop, download again, delete the contents of the states
table, vaccum
again, and put it back up, than restart. That will allow you to keep the stats and you’ll only loose shorter term history.
delete from states;
Oh, that’s exactly what I need. I’m considering moving to MySQL to avoid this requirement of having twice the space available for the purge step. (Purge is vacuum, right?)
The vacuum only happens monthly Recorder - Home Assistant
Any database engine you pick should have 125% of the database size available for temp tables, rebuilds, etc to be safe. MySQL will not save you from that, and its usually not worth the price of admission unless you have a specific need as you are signing up to be the DBA for your instance.
Here is the states query updated to 2023.4.x for SQLite.
The attributes
and entity_id
are deduplicated so they only are stored once now which makes the attributes
bytes calculation misleading.
SELECT
COUNT(state_id) AS cnt,
COUNT(state_id) * 100 / (
SELECT
COUNT(state_id)
FROM
states
) AS cnt_pct,
SUM(
LENGTH(state_attributes.shared_attrs)
) AS bytes,
SUM(
LENGTH(state_attributes.shared_attrs)
) * 100 / (
SELECT
SUM(
LENGTH(state_attributes.shared_attrs)
)
FROM
states
) AS bytes_pct,
states_meta.entity_id
FROM
states
LEFT JOIN state_attributes ON (states.attributes_id=state_attributes.attributes_id)
LEFT JOIN states_meta ON (states.metadata_id=states_meta.metadata_id)
GROUP BY
states.metadata_id
ORDER BY
cnt DESC
This didn’t work for me, I got the following error:
ERROR 1242 (21000): Subquery returns more than 1 row
same issue…
same issue
It looks like we have a solution here:
Just the SQL
SELECT
COUNT(*) AS cnt,
COUNT(*) * 100 / (SELECT COUNT(*) FROM states) AS cnt_pct,
states_meta.entity_id
FROM states
LEFT JOIN states_meta ON (states.metadata_id=states_meta.metadata_id)
GROUP BY entity_id
ORDER BY cnt DESC;
Interesting, it didn’t work for me. I use the default SQLite database and the SQLite DB Browser. I get this when I enter the SQL from @crazyasyou’s post, above:
Execution finished with errors.
Result: ambiguous column name: entity_id
At line 1:
However, it appears to work when I specify the table name for entity_id in the GROUP BY statement:
SELECT
COUNT(*) AS cnt,
COUNT(*) * 100 / (SELECT COUNT(*) FROM states) AS cnt_pct,
states_meta.entity_id
FROM states
LEFT JOIN states_meta ON (states.metadata_id=states_meta.metadata_id)
GROUP BY states_meta.entity_id
ORDER BY cnt DESC;
I haven’t really had time to look into how accurate the results are. So far the new database schema, along with my aggressive excludes in configuration.yaml, have kept my database size reasonable.
Hello, I was wondering what’s the use case for recording events. I already excluded all frequent events from the recorder, but maybe I should exclude them all? Is there an UI or an API that can query the recorded events and show them or do something about them? Or is the only way to access them a manual sql query on the backend db? Is the history tab able to show recorded events? Thanks!