So, I recently ran out of space on my Ubuntu server and found that home-assistant_v2.db
was the culprit, eating up a massive 95 GB. I had set my Home Assistant recorder to keep state data for 70 days (purge_keep_days
), which should be manageable — but 95 GB? Way too big.
Step 1: Database Analysis
I started with some database analysis using the sqlite3_analyzer
tool: sqlite3_analyzer home-assistant_v2.db
. It took a few minutes to complete due to the database size, and generated a large output but the most interesting was this:
/** Disk-Space Utilization Report For home-assistant_v2.db
...
Size of the file in bytes......................... 98372806656
Bytes of user payload stored...................... 51006611082 51.9%
*** Page counts for all tables with their indices *****************************
STATE_ATTRIBUTES.................................. 79849678 83.1%
STATES............................................ 15744915 16.4%
...
*** All tables and indices ****************************************************
...
Unused bytes on all pages......................... 34480337591 35.1%
...
*** Table STATE_ATTRIBUTES and all its indices ********************************
Percentage of total database...................... 83.1%
Number of entries................................. 173759998
Bytes of storage consumed......................... 81766070272
...
Unused bytes on index pages....................... 106686636 11.6%
Unused bytes on primary pages..................... 34106211104 42.2%
Unused bytes on overflow pages.................... 59110 1.5%
Unused bytes on all pages......................... 34212956850 41.8%
...
Turns out, 83% of the space was taken up by the STATE_ATTRIBUTES
table, and over 40% of page bytes are unused! I tried using VACUUM to free up space, but it had no effect. This suggests that the STATE_ATTRIBUTES
table just stores data inefficiently. Maybe a future HA update will address this.
Step 2: Identify Data Bloat Sources
To dig deeper, I checked HA Docs on data storage and realized that the real issue might be in the STATES
table. STATE_ATTRIBUTES
likely bloats because of some entities generating an extreme amount of state changes. I connected to database using sqlite3 home-assistant_v2.db
and used this query to find the top 100 entities with the most state changes:
SELECT
states_meta.entity_id,
COUNT(*) AS record_count
FROM
states
LEFT JOIN states_meta ON (states.metadata_id = states_meta.metadata_id)
GROUP BY states_meta.entity_id
ORDER BY record_count DESC
LIMIT 100;
The result showed that most bloated records were from power meters and device trackers:
entity_id record_count
number.smart_meter_update_frequency 5222612
sensor.smart_meter_power_b 5222610
sensor.smart_meter_power_a 5222610
sensor.smart_meter_energy_produced_b 5222610
sensor.smart_meter_energy_flow_b 5222610
sensor.smart_meter_energy_b 5222610
sensor.smart_meter_energy_a 5222610
sensor.smart_meter_current_b 5222610
sensor.smart_meter_current_a 5222610
sensor.smart_meter_power_factor_b 5222609
sensor.smart_meter_power_factor_a 5222609
sensor.smart_meter_power_ab 5222609
sensor.smart_meter_energy_produced_a 5222609
sensor.smart_meter_energy_flow_a 5222609
switch.relay_entrance 900775
sensor.relay_entrance_energy 900774
sensor.relay_entrance_voltage 900773
sensor.relay_entrance_power 900773
sensor.relay_entrance_linkquality 900773
sensor.relay_entrance_current 900773
device_tracker.ae_34_12_cd_78_45 596372
device_tracker.91_b8_23_ac_09_ef 595466
device_tracker.4f_76_01_aa_92_cd 594797
device_tracker.bc_3d_10_e4_17_a9 560428
device_tracker.33_e7_54_b1_8c_60 548377
device_tracker.1d_c9_02_da_9e_fb 513255
device_tracker.9e_54_48_20_f3_ea 487929
device_tracker.72_a8_3d_11_6c_0f 452719
media_player.tv 350620
switch.aqara_power 164512
switch.aqara_power_power_outage_memory 164508
sensor.aqara_power_power 164506
sensor.aqara_power_energy 164506
sensor.aqara_power_device_temperature 164506
...
If you want to get the output in the same format, you need to execute .mode column
and .headers on
before the query. Execution of the query took a long time too. Be prepared if you have a huge DB. There is a DbStats addon which will do the same and show you nice charts.
I decided that while I want to keep 70 days of state data generally, these specific entities only need a few days of history.
Step 3: Automate the Purge
Luckily, Recorder docs provided examples for exactly this: purging high-frequency data, like power meters. I set up two automations:
alias: Purge noisy device_tracker entities
description: Reduce DB size by purging state changes for specific trackers
triggers:
- trigger: time
at: "05:00:00"
conditions: []
actions:
- action: recorder.purge_entities
metadata: {}
data:
keep_days: 1
entity_globs:
- device_tracker.??_??_*
mode: single
and
alias: Purge noisy power sensors
description: Reduce DB size by purging state changes for power sensors
triggers:
- trigger: time
at: "04:00:00"
conditions: []
actions:
- action: recorder.purge_entities
metadata: {}
data:
keep_days: 5
entity_globs:
- "*.smart_meter_*"
- "*.relay_entrance*"
mode: single
I also used the File Size integration to monitor the DB size, which is perfect for tracking storage impact.
Step 4: Final Cleanup
The first purge run took a day since it seems to delete records one by one. After purging, I checked the DB size — and it hadn’t shrunk. In SQLite, deleted data doesn’t usually free up space automatically, so I needed to run VACUUM
to reclaim it. Recorder also calls VACUUM
every second sunday (if you didn’t unset `auto_repack`) or when you call `recorder.purge` action with `repack` attribute.
Since VACUUM
creates a temp file for a shrinked database (which I didn’t have space for), I connected an external drive, stopped HA and ran VACUUM INTO [external drive path]
in the sqlite3
CLI. This reduced my DB to 12 GB! After replacing the original DB with the vacuumed one, I also deleted home-assistant_v2.db-wal
and home-assistant_v2.db-shm
files (that’s important!). Started Home Assistant, and everything worked perfectly!
Results
With this cleanup, I reduced the DB size by 85%, and I’m still tracking 70 days of data for most entities. If you’re struggling with a bloated home-assistant_v2.db
, I hope these steps help!