It took me a while to figure it all out because I was only a noob…so I borrowed from lots of different places…but this is how I have mine running now:
In the configuration.yaml I put this line:
recorder: !include recorder.yaml
I pull the recorder settings into another file (recorder.yaml obviously) because I control what I keep to ensure my DB doesn’t get crazy large…so the config gets very long…
In the recorder.yaml start with this:
purge_keep_days: 2
db_url: !secret mariadb_url
That tells it to store everything in MariaDB, and tells it that you only want to keep two days of data. Periodically the purge service will sweep through your DB and drop everything older than two days…it is a bit of mystery as to when it actually runs (e.g. on boot, after 24 hours of uptime, etc etc) but I haven’t bothered to dig into it because it really doesn’t matter…it just works and timing is not critical.
Anyway, even with the 2 day setting my DB was large…which is a major pain because the DB is backed up when your backups run…and I was running out of room on my Google Drive…and I didn’t want to pay for additional storage because I have 1TB on OneDrive…and didn’t want to pay extra just for backups (and because I’m lazy and want to minimise moving parts I also didn’t want to set up something else to copy backups from HA to somewhere else so it could export).
I then used DBeaver to show me what was in the DB:
select entity_id,count(*) from states group by entity_id order by 2 desc;
This returns a list of all the state data that it stored sorted by count…for example:
sensor.conservatory_motion_detector_temperature 3766
sensor.services_cost_per_hour 2964
sensor.laundry_motion_detector_temperature 2871
sensor.white_phase 2704
sensor.energy_white_filtered 2704
sensor.white_phase_energy 2701
sensor.energy_white_filtered_cost 2692
sensor.garage_motion_detector_front_temperature 1354
sensor.garage_motion_detector_rear_temperature 1351
sensor.red_phase 1204
sensor.energy_red_filtered 1204
sensor.red_phase_energy 1200
sensor.energy_red_filtered_cost 1192
sensor.hallway_cupboard_motion_detector_temperature 894
sensor.myups_input_line_frequency 665
sensor.weatherflow_air_density 650
This is my current result (of 500 rows) and there’s nothing extreme though still plenty to filter if I wanted to do so to save even more space. In my first attempt I had entities with 10,000+ results…and my DB was pushing 800-1200MB.
I went through the list and started excluding entities where I didn’t need to know the history - for example for a specific temperature measurement (e.g. in the living room) I want the history because I like my nice graph card that shows 24 hours of data, but for the battery level of a door contact sensor I certainly don’t need to have a detailed history because I really only care about the now value. So I set to work and started excluding things by adding content to the recorder.yaml:
exclude:
domains:
- camera
- group
- device_tracker
- media_player
- input_text
- input_number
- input_boolean
- input_number
- input_select
- weblink
- updater
- sun
- timer
- weather
- person
- automation
entity_globs:
- sensor.rpi*
- sensor.pi_hole*
- sensor.processor*
- sensor.memory*
- sensor.disk*
- sensor.load*
- sensor.swap*
- sensor.last_boot*
- sensor.local_ip*
- sensor.home_alarm*
- sensor.*_sensitivity
- sensor.*_illuminance
- sensor.*_lux
- sensor.*_timeout
- sensor.*_price
- sensor.measured_outdoor_air_temperature_*
- sensor.*stairway*temperature
- sensor.*wardrobe*temperature
- sensor.bedroom_bed*
- sensor.hallway_motion_*_temperature
- sensor.esp*
- sensor.*_current
- sensor.*lights*energy*
- sensor.*rail*energy*
- sensor.*extractor*energy*
- sensor.*device_temperature
- sensor.energy_consumption_*_raw
- sensor.*power*
- sensor.*last_seen*
- sensor.myups_load
- sensor.cpu_temperature
- sensor.weatherflow_wind_*
- sensor.weatherflow_beaufort*
- sensor.weatherflow_feels*
- sensor.weatherflow_brightness
- sensor.weatherflow_solar_radiation
- sensor.weatherflow_dewpoint
- sensor.weatherflow_rain*
- sensor.weatherflow_*humidity*
- sensor.weatherflow_*pressure*
- sensor.weatherflow_uv*
- sensor.weatherflow_precipitation*
- sensor.weatherflow_heat_index
- sensor.weatherflow_*lightning*
- sensor.weatherflow_wet_bulb*
- sensor.weatherflow_*voltage*
- sensor.weatherflow_visibility
- sensor.weatherflow_*delta*
- sensor.*_ble
- binary_sensor.espresense_*
- binary_sensor.*_mqtt_room
- binary_sensor.weatherflow_is*
- binary_sensor.*_overpowering
- binary_sensor.*_overheating
- binary_sensor.motion_exterior*
- binary_sensor.zigbee_router*
- binary_sensor.*vibration*
- switch.schedule_*
- switch.*_led_indication
- number.*_occupancy_timeout
entities:
- sensor.hallway_heat_pump_outside_temperature
- sensor.lounge_heat_pump_outside_temperature
- sensor.shelly_laundry_energy
- sensor.shelly_garagebay_energy
My actual listing is considerably longer and probably overdue for a tidy up as I’m sure some entities have moved on…but this gives you the idea. If you list a domain then everything from that area is excluded from history - for example I don’t want anything from ‘weather’. But sometimes you want something, just not everything so then you can use entity_globs which allow you to wildcard an entity string…if you wanted to have no history for when something was last seen you can set a entity_glob of
sensor.*last_seen*
which will exclude anything that matches that pattern. Lastly if you need to be more surgical you can list specific entities to exclude in the entities section. This is useful where you have, say, two motion detectors at opposite ends of a hallway - you actually might want to use their temperature and lux data, but you don’t really need two measurements for the space so you specifically exclude one of the two sets of entities.
Set your config, restart HA, then using Developer Tools call the Recorder: Purge service. You can set the flags as follows:
Days to keep: 2 (or to whatever duration you want to keep)
Repack = Yes
Apply Filter = Yes
That will purge the DB to the meet your new include/exclude and duration filter. It can actually be fun to run this before you make changes so you can see progress:
Now: 800MB
Drop retention to 2 days: 300MB
Recorder.yaml update #1: 200MB
Recorder.yaml update #2: 150MB
Recorder.yaml update #3: 85MB
After that it is all pretty much automatic. I run the DBeaver query every month or so to see if anything needs a tune up because I’m always tinkering so things get added, renamed, or moved…and that’s about it…though now I do rather feel like I should be cleaning out my recorder.yaml again to keep it tidy!
The statistics function will track your long term data (such as energy) automatically and that doesn’t get impacted by keeping your other entity history in check.
Oh…and currently my DB is now floating somewhere between 90MB - 140MB depending on when the purge last ran. So for me I’m running at about 10% of what it would have been without any management…
Hope that helps!
CP.