How to reduce your database size and extend the life of your SD card

In my case, I was able to find the culprit of my rapid DB size bloat, and it was a media_player device responsible for more than 90% of the table entries… I excluded those, and then took a back up, and decided to purge those records from the states table:

DELETE QUICK FROM `states` WHERE metadata_id IN (SELECT metadata_id FROM states_meta WHERE entity_id LIKE 'media_player%')
1 Like

That works with the default SQLite database, too.

1 Like

I’ve put together a similar guide based on this post, including the correct SQL query for grouping and sorting the entities in the “states” table. You can check it out here: https://localsharespace.com/improve-home-assistants-database-performance/

However, I haven’t been able to figure out the query for the “events” table. If anyone knows what we should use for “events”, please let me know and I’ll update my guide. I’m also open to any other suggestions to my guide. Thanks!

4 Likes

Nice. You should drop the yaml configuration for history since thats no longer needed.

I cannot access. What do I need?

Yeah, me too.

Sorry, I think I had some of my security settings set too high. Please try again and see if it works now.

1 Like

Thanks, it works now.

Any change you can update the SQL queries in the fist post (anyone can edit it)?

I get notified about this post so I just had a re-read of the OP.

I was wondering if there might be a way to compare the list of entities produced by the script print_entities_to_file with those in the file notify.entity_log?

That would make it less onerous if you get ‘slack’.

Just a thought, that I haven’t had time to think through properly myself yet.

My 2 cents on “extend the life of your SD card”:

  • You can use eMMC with adapter in place of an SD card: SOLUTION : Hassio HA Database in Memory - #10 by lmagyar1973
  • I’ve made an add-on that runs MariaDB on tmpfs, so it is an in-memory database. See: In-memory MariaDB (MySQL) add-on for recorder/history integration
    • This is a fork of the official add-on!
    • This version uses tmpfs to store MariaDB databases in-memory.
    • Even this is an in-memory database, it can automatically export (from memory to SD card) the database’s content during backup, update, restart or even periodically, and can automatically import (from SD card to memory) the content when the add-on starts again. The database dump is gzip-ed before written to the storage to minimize SD-card wear.
    • Though it won’t protect you from power failures completely. After a power failure, when the add-on is restarted, it will import the last known exported database content. So when eg. daily periodic export (from memory to SD card) is enabled, you will loose the latest sensory data within that day, but your long term statistics information will remain mostly intact:

But first you have to really minimize the amount of data you want to store in the in-memory database!

I store only the necessary entities for graphs/charts, I turned on the regular, daily export (memory->SD) to protect against power failures, and I turned on to delete the old statistics that I don’t want to display. The best is to keep the DB size only a few 10MB.

Is there a faster way of removing errant records from the database (MySQL) - i’ve got a series of sensors that are incredibly chatty (> 3 million records in a week) - removing them via the purge entities service is glacially slow.

Is there a reason you need to keep those 3 million records per week? If my math is correct, that’s five per second. I don’t think the HA database is optimized for that sort of volume. And if you’re using an SD card (as in the title of this thread) then you’re really asking for trouble.

Best practice would be to exclude them. If you need to keep historical data, create templates which sample them less frequently. Keep those and exclude the source data.

1 Like

Hell no, this is the result of reconfiguring a zigbee sensor after which I missed a number of entities in populating recorder exclude.

1 Like
Doing so gives me a black page. And an error in the log:

Logger: homeassistant.components.http.security_filter
Source: components/http/security_filter.py:66
Integration: HTTP (documentation, issues)
First occurred: 15:19:15 (5 occurrences)
Last logged: 15:22:58

Filtered a request with unsafe byte query string: /api/hassio_ingress/4na91HttrXMhiO4gszMnRJBZUmR7gruQGT1Dl1UWCws/states/query/?ordering=&export_ordering=&sql=--+Updated+query+Dec+2023%0D%0ASELECT%0D%0A++COUNT%28*%29+AS+cnt%2C%0D%0A++COUNT%28*%29+*+100+%2F+%28SELECT+COUNT%28*%29+FROM+states%29+AS+cnt_pct%2C%0D%0A++states_meta.entity_id%0D%0AFROM+states%0D%0AINNER+JOIN+states_meta+ON+states.metadata_id%3Dstates_meta.metadata_id%0D%0AGROUP+BY+states_meta.entity_id%0D%0AORDER+BY+cnt+ASC
Filtered a request with unsafe byte query string: /api/hassio_ingress/4na91HttrXMhiO4gszMnRJBZUmR7gruQGT1Dl1UWCws/states/query/?ordering=&export_ordering=&sql=SELECT%0D%0A++COUNT%28*%29+AS+cnt%2C%0D%0A++COUNT%28*%29+*+100+%2F+%28SELECT+COUNT%28*%29+FROM+states%29+AS+cnt_pct%2C%0D%0A++states_meta.entity_id%0D%0AFROM+states%0D%0AINNER+JOIN+states_meta+ON+states.metadata_id%3Dstates_meta.metadata_id%0D%0AGROUP+BY+states_meta.entity_id%0D%0AORDER+BY+cnt+ASC
Filtered a request with unsafe byte query string: /api/hassio_ingress/4na91HttrXMhiO4gszMnRJBZUmR7gruQGT1Dl1UWCws/states/query/?ordering=&export_ordering=&sql=SELECT+%0D%0A++++entity_id%2C%0D%0A++++COUNT%28*%29+AS+cnt%0D%0AFROM+states%0D%0AGROUP+BY%0D%0A++++entity_id%0D%0AORDER+BY%0D%0A++++COUNT%28*%29+DESC%3B

Whats wrong?

I think the database has changed since then. Check out this post.

Same error. Black screen…

ERROR: column "id" does not exist

I’m testing out compression and wondering where the ID column has gone?

Anyone using compression?

Testing with this:

ALTER TABLE ltss SET (
	timescaledb.compress,
	timescaledb.compress_orderby = 'time DESC',
	timescaledb.compress_segmentby = 'entity_id'
);

Changed chunks to 12 hours.

I know that this thread wasn’t touched for a long time, but as it’s still very valid, I wanted to add a second (my) way for the Analytical approach > Template Editor Method, as I found the way with Excel and scripts to tedious, especially when one wants to update regularly.

I’m doing an all-out approach with this, as I first exclude everything and include only specific entities. This is done by getting an ordered list of all domains to exclude, and a list with all entity_ids to include.

Enter this in the template editor (developer tools)

logbook:
  exclude:
    domains:
{% for domain in states | map(attribute="domain") | unique | sort -%}
{{ '      ' }}- {{ domain }}
{% endfor -%}
{{ '  ' }}include:
    entities:
{% for entityid in states | map(attribute="entity_id") | sort -%}
{{ '      ' }}- {{ entityid }}
{% endfor %}
This is the version, if you include a file like `logbook.yaml`

If you include a file for logbook and/or recorder, you need different indentations. In this case, use this for the template editor:

exclude:
  domains:
{% for domain in states | map(attribute="domain") | unique | sort -%}
{{ '    ' }}- {{ domain }}
{% endfor -%}
include:
  entities:
{% for entityid in states | map(attribute="entity_id") | sort -%}
{{ '    ' }}- {{ entityid }}
{% endfor %}

It’s nothing fancy, just indentation, but you know how much that matters in YAML.

Now you should get an already formatted list for your recorder and logbook, where you only have to go through the entities you’d like to include.

Copy the list from the right in the template editor and paste it into your editor/add-on/visual studio (whatever you choose to work on your YAML). It should be correctly indented and the only thing you need to do is deleting the lines from the entities you don’t want to include. :slight_smile:

Hope this helps some, for me it’s easier than to work with Excel or another table calculation program. :slight_smile:

PS: @tom_l If you want me to add this to the first post, please let me know. :slight_smile:

5 Likes

I’m using TimescaleDB with recorder and LTSS and would like to optimize my DB. I’m not a DB expert so seeking some advice.

SELECT pg_size_pretty(pg_database_size('homeassistant'));

"37 GB"
SELECT
   relname  as table_name,
   pg_size_pretty(pg_total_relation_size(relid)) As "Total Size",
   pg_size_pretty(pg_indexes_size(relid)) as "Index Size",
   pg_size_pretty(pg_relation_size(relid)) as "Actual Size"
   FROM pg_catalog.pg_statio_user_tables 
ORDER BY pg_total_relation_size(relid) DESC;

|states|7742 MB|5664 MB|2077 MB|
|state_attributes|3663 MB|457 MB|3205 MB|
|statistics|1862 MB|1028 MB|834 MB|
|_hyper_1_26_chunk|1341 MB|729 MB|612 MB|
|_hyper_1_28_chunk|1333 MB|723 MB|609 MB|
|_hyper_1_27_chunk|1328 MB|722 MB|606 MB|
|_hyper_1_19_chunk|1224 MB|682 MB|542 MB|
|_hyper_1_20_chunk|1215 MB|677 MB|538 MB|
|_hyper_1_21_chunk|1128 MB|628 MB|500 MB|
|_hyper_1_15_chunk|1119 MB|621 MB|497 MB|
|_hyper_1_25_chunk|1118 MB|607 MB|511 MB|
|_hyper_1_23_chunk|1117 MB|620 MB|497 MB|
|_hyper_1_22_chunk|1115 MB|620 MB|494 MB|
|_hyper_1_18_chunk|1114 MB|615 MB|500 MB|
|_hyper_1_16_chunk|1100 MB|606 MB|493 MB|
|_hyper_1_17_chunk|1063 MB|585 MB|478 MB|
|_hyper_1_24_chunk|1027 MB|564 MB|463 MB|
|_hyper_1_14_chunk|1001 MB|563 MB|438 MB|

If I am correct, a big gain can be made by enabling compression on the chunks ?
But this it totally unclear for me…