How to keep your recorder database size under control

gon · March 2, 2022, 7:49am

since the documentation somewhere (or someone) says only sql-select is possible:
i suspect the only sensible solution is to use a cronjob in the underlying linux to execute the delete’s.

denilsonsa · March 2, 2022, 1:12pm

Since there is the SQLite Web add-on which has access to the database, someone with enough knowledge and time could create an add-on to run SQL queries periodically on the HA database.

Alternatively, someone could try using the SQL integration to build a pseudo-sensor that runs the DELETE query (instead of the usual SELECT query for a sensor). The scan_interval configuration is relevant too. I have no idea if this idea would work, but worth trying.

Even if the DELETE pseudo-sensor doesn’t work, it’s possible to create sensors for the size of each table in the recorder database.

CaptTom · March 2, 2022, 3:37pm

Interesting idea. I confirmed that the SQL Integration sensor runs every thirty seconds. Unfortunately, I got the following error trying to add a scan_interval to the definition:

Invalid config for [sensor.sql]: [scan_interval] is an invalid option for [sensor.sql]. Check: sensor.sql->queries->0->scan_interval. (See ?, line ?).

I don’t really want to do a DELETE every 30 seconds. That would defeat the purpose of minimizing I/O and database thrashing.

nicohirsch · March 9, 2022, 8:45pm

@CaptTom You can add the scan_interval to the platform:

- platform: sql
  scan_interval: 3600
  queries:
    - name: count_sensors_1
      query: "SELECT entity_id, count(*) as NUM FROM states GROUP BY entity_id ORDER BY NUM DESC LIMIT 1;"
      column: 'entity_id'

If you need different intervals, you can can add the platform sql multiple times.

CaptTom · March 9, 2022, 9:05pm

@nicohirsch, I hadn’t thought of that.

Have you tried that? Does a DELETE actually work in the sql platform?

nicohirsch · March 12, 2022, 6:03pm

@CaptTom Just now I’ve tested the DELETE query, but it doesn’t work:

homeassistant.exceptions.HomeAssistantError: 
The system cannot restart because the configuration is not valid: 
Invalid config for [sensor.sql]: 
Only SELECT queries allowed for dictionary value

bdraco · March 23, 2022, 10:23am

Please take note of your database size before upgrading to 2022.4 and check it again after it’s been running for the full purge interval. It would be nice to see how much some of the database optimizations that are forthcoming in the upcoming release impact the size of the database.

CaptTom · March 23, 2022, 12:04pm

Will do! I won’t touch the database between now and when I update again (probably toward the end of April) so I’ll have a good base line value for the database size.

I didn’t know there was any database work going on, thank you. Is there any place I can look which describes the optimizations?

jacla · March 23, 2022, 8:58pm

When you have influxdb running do all of this still applies?
Only using HA for about 2 weeks and DB is already 1gb.
Changed the retention policy in influxdb config to 7 days from default but not sure if this will make a difference reading some articles you need to play around with continuous queries??

Would purge option or exclusions in config. Yaml make a difference when using influxdb?

HeyImAlex · March 23, 2022, 9:41pm

Yes. Influxdb is an entirely separate and additional DB that doesn’t replace the build-in HA database used to retrieve the history.

Not on the HA history DB.

Yes, they will affect the built-in history database. They will not influence your influxdb though.

jacla · March 23, 2022, 9:50pm

Confusing… In config folder i just see 1 big DB file
Home-assistant_v2. Db (1gb)

Is there article describing what files/dbs are where?

I thought (newbie thought) influx simply replaced the native db completely and all goes to this influx db which i assume is that big. Db file in config folder?

Thanks a lot

HeyImAlex · March 23, 2022, 10:11pm

The native HA recorder component, which stores and retrieves the history you can see from within HA in the history panel or when clicking on a sensor in Lovelace, or over custom cards such as the mini graph card, is an SQL based DB. By default HA uses the SQLite engine, which produces the home-assistant_v2.db file you’re seeing. HA support a number of alternative DB engines that can replace SQLite, they’re all SQL based (MariaDB, MySQL and PostgreSQL). They all share the same config for the recorder component in the YAML and they’re what this OP is about. While you could remove the recorder integration entirely, this would probably break a lot of fundamental things in HA.

See here for more info.

The Influxdb integration is a completely separate optional integration that also stores your state data (only from sensors) into an additional unrelated DB using influx. This neither replaces nor interferes in any way with the above native SQL DB. In fact, it will store your data twice, once in each DB, unless you have it excluded from either. The data stored in influxdb is usually used for long term storage of some sensors and is usually viewed using an external tool such as Grafana. It is not what you see when you open the native HA history panel or sensor history.

bdraco · March 25, 2022, 9:15pm

Rough draft preview of the notes for 2022.4

Database Performance

This release features write reductions intended to preserve SD card lifetimes, performance improvements designed to get data to the frontend faster, and database size reduction ranging from 10-35% for most installs.

When the same set of state attributes already exists in the database, newly recorded state changes now only store a reference to the state attributes instead of saving the whole set of state attributes.

As data stored in the database before 2022.4 does not get updated, the reduction in database size will occur over time with the recording of new data and the purging of old data.

Most entity history graphs no longer need to fetch the attributes from the backend, and as a result, they load much faster after 24 hours of running on 2022.4. This change is especially noticeable for energy sensors.

For a few a complete set of changes to the underlying database, check out the Data Science Portal

sender · March 26, 2022, 6:40am

Hm, this counts for the inbuilt db I presume. Or also for influx in any way?

bdraco · March 26, 2022, 8:43am

Nothing to do with influxdb

AaronCake · March 26, 2022, 2:43pm

YES! This is awesome!

HeyImAlex · March 26, 2022, 9:13pm

Great news !

I assume this is done by using the new no_attributes option to the API ?

bdraco · March 26, 2022, 9:14pm

Yes, that is correct

HeyImAlex · March 26, 2022, 9:15pm

Perfect, thanks, will update my history explorer card code in anticipation of the update

bdraco · March 26, 2022, 9:19pm

The following domains still need attributes:

github.com

home-assistant/frontend/blob/ddf1cc0733df07e91821e626909e874817b488b8/src/data/history.ts#L11

    
      
          import { HassEntity } from "home-assistant-js-websocket";
          import { computeDomain } from "../common/entity/compute_domain";
          import { computeStateDisplay } from "../common/entity/compute_state_display";
          import { computeStateDomain } from "../common/entity/compute_state_domain";
          import { computeStateName } from "../common/entity/compute_state_name";
          import { LocalizeFunc } from "../common/translations/localize";
          import { HomeAssistant } from "../types";
          import { FrontendLocaleData } from "./translation";
          
          
const DOMAINS_USE_LAST_UPDATED = ["climate", "humidifier", "water_heater"];
          const NEED_ATTRIBUTE_DOMAINS = [
            "climate",
            "humidifier",
            "input_datetime",
            "thermostat",
            "water_heater",
          ];
          const LINE_ATTRIBUTES_TO_KEEP = [
            "temperature",
            "current_temperature",
            "target_temp_low",

The sensor entities do not though which are where most state changes come from.