Simple way to reduce your DB size

Though I’ve been fighting with DB size it was still large. Yes, I had configured recorder and logbook sections as suggested, excluding things that appear in history or logs, but still, it was growing large which in turn increased reboot time and I suspect was also cause why my installation could become irresponsible after some time or after unlucky hard reboot (doing it sometimes).

But…In fact, it was super easy to identify what makes it grow large, just by inspecting DB itself by doing the following:

  1. open up a terminal (SSH or via web, I use this extension for this purpose https://github.com/hassio-addons/addon-ssh)
  2. change directory to config cd ~/config
  3. open sqlite shell sqlite3 home-assistant_v2.db
  4. enter the following commands in the shell:
    .header on
    .mode column
    .width 50, 10,
    SELECT entity_id, COUNT(*) as count FROM states GROUP BY entity_id ORDER BY count DESC LIMIT 20;

You will see smth like:

entity_id                       count
------------------------------  ----------
sensor.cpu                      28874
sun.sun                         21238
sensor.time                     18415
sensor.new_york                 18393
cover.kitchen_cover             17811
switch.mystrom_switch           14101
sensor.internet_time            12963
sensor.solar_angle1             11397
sensor.solar_angle              10440
...

NOTE: I’ve used data from https://www.home-assistant.io/docs/backend/database (and “surprisingly” commands too)

All records are ordered by count, so you will easily find out what is bloating your DB.

BTW I had all the same records as in this example, so added them into the exclude list (I don’t know what is good for me from sun/solar/cpu history so I’ve just wiped it out from recordings). FYI here is part of my config to keep DB small

recorder:
  exclude:
    domains:
      - updater
      - media_player
      - device_tracker
      - weather
      - sun
    entities:
      - sun.sun
      - sensor.cpu
      - sensor.cpu_temp
      - sensor.date
      - sensor.time
      - sensor.new_york
      - cover.kitchen_cover
      - switch.mystrom_switch
      - sensor.internet_time
      - sensor.solar_angle1
      - sensor.solar_angle
      - automation.lights_watchdog
logbook:
  exclude:
    domains:
      - media_player
      - device_tracker
      - weather
      - sun
    entities:
      - automation.lights_watchdog
      - sensor.date

NOTE: even though I already had by that moment sun in exclude.domains but records sun.sun were still there in DB. I had cleaned up DB after applying thin config so it’s no doubt about it. So I had to add sun.sun into entities exclude list too

Hope it would help you!

14 Likes

That’s great

Is there anyway to pull that directly into HA for display? My understanding of the SQL sensor is that it pulls a single value, not a table or list of data.

Not sure this will work?

  - platform: sql
    db_url: !secret mariadb_connection_url
    scan_interval: 1800
    queries:
    - name: Top Entities Database
      query: 'SELECT entity_id, COUNT(*) as count FROM homeassistant.states GROUP BY entity_id ORDER BY count DESC LIMIT 20;'
1 Like

Very interesting post! Hoping to see this evolving to an advisory sensor what to exclude. Following…

1 Like

Sorry for necroposting, but since a while these commands don’t work anymore. The docs is outdated too. I believe Home Assistant has changed the DB scheme.

I have found the updated commands, but I can’t remember where, I think was some discussion on Github:

SELECT states_meta.entity_id as Entity, count(states.metadata_id) as Count FROM states INNER JOIN states_meta ON states.metadata_id = states_meta.metadata_id GROUP BY states.metadata_id ORDER BY count(states.metadata_id) DESC LIMIT 10;

10 Likes

This SELECT command still works in 2024. Thank you !

1 Like

Just FYI, this is the query that works on the current version of HA 2025.01

SELECT m.entity_id, COUNT(*) as count FROM states AS S
INNER JOIN states_meta AS M ON M.metadata_id = s.metadata_id
GROUP BY m.entity_id ORDER BY count DESC LIMIT 20;

Cheers

2 Likes

One more thing, I found this to be the simplest way to run queries. it also lets you save your queries so you can come back and run them whenever.