How I tackled a bloated home-assistant_v2.db file

So, I recently ran out of space on my Ubuntu server and found that home-assistant_v2.db was the culprit, eating up a massive 95 GB. I had set my Home Assistant recorder to keep state data for 70 days (purge_keep_days), which should be manageable — but 95 GB? Way too big.

Step 1: Database Analysis

I started with some database analysis using the sqlite3_analyzer tool: sqlite3_analyzer home-assistant_v2.db. It took a few minutes to complete due to the database size, and generated a large output but the most interesting was this:

/** Disk-Space Utilization Report For home-assistant_v2.db

...
Size of the file in bytes......................... 98372806656
Bytes of user payload stored...................... 51006611082  51.9% 

*** Page counts for all tables with their indices *****************************

STATE_ATTRIBUTES.................................. 79849678    83.1% 
STATES............................................ 15744915    16.4% 
...

*** All tables and indices ****************************************************

... 
Unused bytes on all pages......................... 34480337591  35.1% 
...

*** Table STATE_ATTRIBUTES and all its indices ********************************

Percentage of total database......................  83.1%    
Number of entries................................. 173759998 
Bytes of storage consumed......................... 81766070272
...
Unused bytes on index pages....................... 106686636   11.6% 
Unused bytes on primary pages..................... 34106211104  42.2% 
Unused bytes on overflow pages.................... 59110        1.5% 
Unused bytes on all pages......................... 34212956850  41.8% 
...

Turns out, 83% of the space was taken up by the STATE_ATTRIBUTES table, and over 40% of page bytes are unused! I tried using VACUUM to free up space, but it had no effect. This suggests that the STATE_ATTRIBUTES table just stores data inefficiently. Maybe a future HA update will address this.

Step 2: Identify Data Bloat Sources

To dig deeper, I checked HA Docs on data storage and realized that the real issue might be in the STATES table. STATE_ATTRIBUTES likely bloats because of some entities generating an extreme amount of state changes. I connected to database using sqlite3 home-assistant_v2.db and used this query to find the top 100 entities with the most state changes:

SELECT
    states_meta.entity_id,
    COUNT(*) AS record_count
FROM
    states
    LEFT JOIN states_meta ON (states.metadata_id = states_meta.metadata_id)
GROUP BY states_meta.entity_id
ORDER BY record_count DESC
LIMIT 100;

The result showed that most bloated records were from power meters and device trackers:

entity_id                                     record_count
number.smart_meter_update_frequency           5222612
sensor.smart_meter_power_b                    5222610
sensor.smart_meter_power_a                    5222610
sensor.smart_meter_energy_produced_b          5222610
sensor.smart_meter_energy_flow_b              5222610
sensor.smart_meter_energy_b                   5222610
sensor.smart_meter_energy_a                   5222610
sensor.smart_meter_current_b                  5222610
sensor.smart_meter_current_a                  5222610
sensor.smart_meter_power_factor_b             5222609
sensor.smart_meter_power_factor_a             5222609
sensor.smart_meter_power_ab                   5222609
sensor.smart_meter_energy_produced_a          5222609
sensor.smart_meter_energy_flow_a              5222609
switch.relay_entrance                         900775
sensor.relay_entrance_energy                  900774
sensor.relay_entrance_voltage                 900773
sensor.relay_entrance_power                   900773
sensor.relay_entrance_linkquality             900773
sensor.relay_entrance_current                 900773
device_tracker.ae_34_12_cd_78_45              596372
device_tracker.91_b8_23_ac_09_ef              595466
device_tracker.4f_76_01_aa_92_cd              594797
device_tracker.bc_3d_10_e4_17_a9              560428
device_tracker.33_e7_54_b1_8c_60              548377
device_tracker.1d_c9_02_da_9e_fb              513255
device_tracker.9e_54_48_20_f3_ea              487929
device_tracker.72_a8_3d_11_6c_0f              452719
media_player.tv                               350620
switch.aqara_power                            164512
switch.aqara_power_power_outage_memory        164508
sensor.aqara_power_power                      164506
sensor.aqara_power_energy                     164506
sensor.aqara_power_device_temperature         164506
...

If you want to get the output in the same format, you need to execute .mode column and .headers on before the query. Execution of the query took a long time too. Be prepared if you have a huge DB. There is a DbStats addon which will do the same and show you nice charts.

I decided that while I want to keep 70 days of state data generally, these specific entities only need a few days of history.

Step 3: Automate the Purge

Luckily, Recorder docs provided examples for exactly this: purging high-frequency data, like power meters. I set up two automations:

alias: Purge noisy device_tracker entities
description: Reduce DB size by purging state changes for specific trackers
triggers:
  - trigger: time
    at: "05:00:00"
conditions: []
actions:
  - action: recorder.purge_entities
    metadata: {}
    data:
      keep_days: 1
      entity_globs:
        - device_tracker.??_??_*
mode: single

and

alias: Purge noisy power sensors
description: Reduce DB size by purging state changes for power sensors
triggers:
  - trigger: time
    at: "04:00:00"
conditions: []
actions:
  - action: recorder.purge_entities
    metadata: {}
    data:
      keep_days: 5
      entity_globs:
        - "*.smart_meter_*"
        - "*.relay_entrance*"
mode: single

I also used the File Size integration to monitor the DB size, which is perfect for tracking storage impact.

Step 4: Final Cleanup

The first purge run took a day since it seems to delete records one by one. After purging, I checked the DB size — and it hadn’t shrunk. In SQLite, deleted data doesn’t usually free up space automatically, so I needed to run VACUUM to reclaim it. Recorder also calls VACUUM every second sunday (if you didn’t unset `auto_repack`) or when you call `recorder.purge` action with `repack` attribute.

Since VACUUM creates a temp file for a shrinked database (which I didn’t have space for), I connected an external drive, stopped HA and ran VACUUM INTO [external drive path] in the sqlite3 CLI. This reduced my DB to 12 GB! After replacing the original DB with the vacuumed one, I also deleted home-assistant_v2.db-wal and home-assistant_v2.db-shm files (that’s important!). Started Home Assistant, and everything worked perfectly!

Results

With this cleanup, I reduced the DB size by 85%, and I’m still tracking 70 days of data for most entities. If you’re struggling with a bloated home-assistant_v2.db, I hope these steps help!

5 Likes

You don’t make it clear why you needed to keep 70 days of data. Have you analysed how often you use it? :grin:

With long term statistics available, for state changes even the default 10 days seems generous. My home-assistant_v2.db is 635 MB.

2 Likes

That’s a good point. Long term statistics is a big deal. I set 70 days a long time ago, when there were no long term stats feature.
Currently I use these stats mostly for debugging. To compare previous sensor behaviour with current incorrect one.

Anyway. Your DB may bloat in size even if you use defaults. And this tutorial will help to trace the problem.

2 Likes