Hass database growing huge

UglyBob · November 5, 2018, 8:00pm

I just noticed the v2 hass database has grown to gigantic 8GB and my little 21GB SSD is now full. I need to reduce this size a lot and also prevent this from happening in the future as the server is behaving very weird with a full disk. What can I do? I read a bit about the recorder service, but all the default values for that seems fine, still don’t explain why it is this huge…

Dixey · November 5, 2018, 8:07pm

recorder:
  purge_keep_days: 5
  purge_interval: 1

Works for me. I’m not concerned about when my dining room lights were turned on last month.

UglyBob · November 5, 2018, 8:15pm

I will test, but I still don’t understand how it can grow that crazy with default values of 10 days, interval 1. And I don’t have a crazy amount of sensors.

UglyBob · November 5, 2018, 8:20pm

Sigh, I saw it created the temporary files home-assistant_v2.db-shm and home-assistant_v2.db-wal and they filled up for a while, then they disappeared and my home-assistant_v2.db is still as huge.

silvrr · November 5, 2018, 8:21pm

I would exclude what you don’t need history on. Thing that update frequently (time, sun) cause a lot of entries in the database.

I looked at the history page (left side bar of frontend) and wrote an exclude for everything I didn’t want a history for. I excluded some domains outright too. My database used to get to a few GB and now maintains at a few hundred MB.

Example config:

github.com

SilvrrGIT/HomeAssistant/blob/master/recorder.yaml


#https://home-assistant.io/components/binary_sensor/

  purge_keep_days: 3
  exclude:
    domains:
      - automation
      - group
      - script
      - switch
      - zone
      - zwave
    entities:
      - binary_sensor.ge_32563_hinge_pin_smart_door_sensor_sensor
      - binary_sensor.ge_32563_hinge_pin_smart_door_sensor_sensor_2
      - binary_sensor.unknown_id021f_unknown_type0003_id0101_sensor
      - binary_sensor.vision_zg8101_garage_door_detector_sensor
      - sun.sun
      - input_boolean.vacation
  include:

This file has been truncated. show original

Dixey · November 5, 2018, 8:21pm

It depends alot on frequency of state changes the sensors that you monitor go through. I.e. I have an ISS tracker just for giggles, longitude and latitude change by the minute… I don’t record them, nor sun.sun stuff.

UglyBob · November 5, 2018, 8:26pm

I ran the purge service manually now and see that it feels the whole disk trying to rebuild the database. Can I just delete it manually this time? And can I avoid having this space hungry purge by switching to a real database?

Dixey · November 5, 2018, 8:30pm

If you are up for personal opinions… I exclude everything and then line by line, include only what I am either testing or want a history of. The recording of everything by default is fine when you start but as you grow, so does the database, exponentially:

purge_keep_days: 5
purge_interval: 1
include:
  entities:
    - sensor.smithville_apparent_temperature
    - sensor.smithville_cloud_coverage
    - sensor.smithville_daytime_high_temperature
    - sensor.smithville_humidity
    - sensor.smithville_nearest_storm_dir
    - sensor.smithville_nearest_storm_distance
    - sensor.smithville_overnight_low_temperature
    - sensor.smithville_ozone
    - sensor.smithville_precip
    - sensor.smithville_pressure
    - sensor.smithville_temperature
    - sensor.smithville_uv_index
    - sensor.smithville_visibility
    - sensor.smithville_wind_dir
    - sensor.smithville_wind_speed
    - sensor.galveston_apparent_temperature
    - sensor.galveston_cloud_coverage
    - sensor.galveston_daytime_high_temperature
    - sensor.galveston_humidity
    - sensor.galveston_nearest_storm_dir
    - sensor.galveston_nearest_storm_distance
    - sensor.galveston_overnight_low_temperature
    - sensor.galveston_ozone
    - sensor.galveston_precip
    - sensor.galveston_pressure
    - sensor.galveston_temperature
    - sensor.galveston_uv_index
    - sensor.galveston_visibility
    - sensor.galveston_wind_dir
    - sensor.galveston_wind_speed
    - binary_sensor.kitchen_motion_opened
    - sensor.kitchen_motion_humidity
    - sensor.kitchen_motion_temperature
    - binary_sensor.laundry_motion_opened
    - sensor.laundry_motion_humidity
    - sensor.laundry_motion_temperature
    - sensor.406_roaming_motion_detected
    - sensor.condo_humidity
    - sensor.condo_temp
    - binary_sensor.code_1
    - binary_sensor.tool_shed_opened
exclude:
  domains:
    - updater
    - camera
    - media_player
    - sun
    - zone
    - group
    - sensor
    - binary_sensor
    - automation
    - calendar
  entities:
    - weather.smithville

In other words, other than the weather history and some activity at my locations, I could care less about what DID happen.

silvrr · November 5, 2018, 8:33pm

I like this approach. I wonder if my include list would be shorter than my exclude.

Dixey · November 5, 2018, 8:38pm

While you have HA NOT running, you can just delete the database. When you fire up HA again, if a database isn’t found, a new one will be created.

winter · November 6, 2018, 12:33am

I run a bunch of queries to see what domains and entities have the most state changes:

I was surprised to see the major of state changes were to do with the power measurement component of my smartthings outlets, something I never use!

entities
select count(state_id) as count, entity_id from states group by entity_id order by count asc;

domains:
select count(state_id) as count, domain from states group by domain order by count asc;

ReneTode · November 6, 2018, 1:43am

i recently started a second instance from HA. and all i do is move all sensors i got into my second HA.
oke i got about 80 or 90 sensors, but it gives a a db from 100 MB a day!

moving just the state change from the same sensors to a text file doesnt even give me 100 mb a month.
in my main HA instance i did shut down the history for a long time, because of this issue (about 60 updates from HA ago)
i guess its not really changed, there is to much saved for every event.
and yo can change the amount of entities that are saved in history, but you cant change what is saved for each entity.

so if you want to have history data, find a way to save it yourself ( i use AD) and if it isnt important at all just shut down hislory and recorder.

3vasi0n · November 6, 2018, 4:12am

@ReneTode I know exactly how you feel. I get 500mb databases all the time.

winter · November 6, 2018, 7:49pm

After this configuration change, I get about 5MB / day:

recorder:
  purge_keep_days: 14
  purge_interval: 1
  exclude:
    domains:
      - automation
      - group
      - input_boolean
      - zha
      - zwave
      - script
    entities:
      - sun.sun
      - sensor.kitchen_lamp_power
      - sensor.front_wall_lights_power

UglyBob · November 8, 2018, 7:33pm

Would have been a nice feature if you could instead of number of days until purge set a maximum size for purging imo…

mitchins · November 23, 2019, 5:30am

Facing this issue too with binary motion sensors. Ideally I would like the history of this… or some compacted form but people walking by way too often just causes it to freak out and grow a few hundred mb per day. Not that postgresql has an issue returning that much data, but the UI chugs on the query.
Additionally after the purge failing a couple moths had a 23gb data.

Options are:

Exclude it all together from recorder
Limit everything to a day and hope thats enough (but the UI can still chug rendering lots of small events)
Add some custom Postgresql maintenace script to drop excessive records (e.g. keep the last 100 states or 6 hours, whichever is greater). Unsure if this can cause issue with HA.

I’m unsure if dropping status record would affect just the logbook/history UI pages… or would it impact using these entities on the lovelace UI?
This last question particularly I am keen on hearing back, though I could test it, that’s messy for a working system if someone knows already.

The last thing is, influxDB is VERY efficient at both storing and querying this data, but that’s mostly external things like grafana… but I COULD pull it from there if i drop them from the recorder.

Here’s about three days worth of event data… worst offenders at bottom