since the documentation somewhere (or someone) says only sql-select is possible:
i suspect the only sensible solution is to use a cronjob in the underlying linux to execute the delete’s.
Since there is the SQLite Web add-on which has access to the database, someone with enough knowledge and time could create an add-on to run SQL queries periodically on the HA database.
Alternatively, someone could try using the SQL integration to build a pseudo-sensor that runs the DELETE
query (instead of the usual SELECT
query for a sensor). The scan_interval
configuration is relevant too. I have no idea if this idea would work, but worth trying.
Even if the DELETE
pseudo-sensor doesn’t work, it’s possible to create sensors for the size of each table in the recorder database.
Interesting idea. I confirmed that the SQL Integration sensor runs every thirty seconds. Unfortunately, I got the following error trying to add a scan_interval
to the definition:
Invalid config for [sensor.sql]: [scan_interval] is an invalid option for [sensor.sql]. Check: sensor.sql->queries->0->scan_interval. (See ?, line ?).
I don’t really want to do a DELETE
every 30 seconds. That would defeat the purpose of minimizing I/O and database thrashing.
@CaptTom You can add the scan_interval
to the platform:
- platform: sql
scan_interval: 3600
queries:
- name: count_sensors_1
query: "SELECT entity_id, count(*) as NUM FROM states GROUP BY entity_id ORDER BY NUM DESC LIMIT 1;"
column: 'entity_id'
If you need different intervals, you can can add the platform sql multiple times.
@nicohirsch, I hadn’t thought of that.
Have you tried that? Does a DELETE
actually work in the sql
platform?
@CaptTom Just now I’ve tested the DELETE
query, but it doesn’t work:
homeassistant.exceptions.HomeAssistantError:
The system cannot restart because the configuration is not valid:
Invalid config for [sensor.sql]:
Only SELECT queries allowed for dictionary value
Please take note of your database size before upgrading to 2022.4 and check it again after it’s been running for the full purge interval. It would be nice to see how much some of the database optimizations that are forthcoming in the upcoming release impact the size of the database.
Will do! I won’t touch the database between now and when I update again (probably toward the end of April) so I’ll have a good base line value for the database size.
I didn’t know there was any database work going on, thank you. Is there any place I can look which describes the optimizations?
When you have influxdb running do all of this still applies?
Only using HA for about 2 weeks and DB is already 1gb.
Changed the retention policy in influxdb config to 7 days from default but not sure if this will make a difference reading some articles you need to play around with continuous queries??
Would purge option or exclusions in config. Yaml make a difference when using influxdb?
Yes. Influxdb is an entirely separate and additional DB that doesn’t replace the build-in HA database used to retrieve the history.
Not on the HA history DB.
Yes, they will affect the built-in history database. They will not influence your influxdb though.
Confusing… In config folder i just see 1 big DB file
Home-assistant_v2. Db (1gb)
Is there article describing what files/dbs are where?
I thought (newbie thought) influx simply replaced the native db completely and all goes to this influx db which i assume is that big. Db file in config folder?
Thanks a lot
The native HA recorder component, which stores and retrieves the history you can see from within HA in the history panel or when clicking on a sensor in Lovelace, or over custom cards such as the mini graph card, is an SQL based DB. By default HA uses the SQLite engine, which produces the home-assistant_v2.db file you’re seeing. HA support a number of alternative DB engines that can replace SQLite, they’re all SQL based (MariaDB, MySQL and PostgreSQL). They all share the same config for the recorder component in the YAML and they’re what this OP is about. While you could remove the recorder integration entirely, this would probably break a lot of fundamental things in HA.
The Influxdb integration is a completely separate optional integration that also stores your state data (only from sensors) into an additional unrelated DB using influx. This neither replaces nor interferes in any way with the above native SQL DB. In fact, it will store your data twice, once in each DB, unless you have it excluded from either. The data stored in influxdb is usually used for long term storage of some sensors and is usually viewed using an external tool such as Grafana. It is not what you see when you open the native HA history panel or sensor history.
Rough draft preview of the notes for 2022.4
Database Performance
This release features write reductions intended to preserve SD card lifetimes, performance improvements designed to get data to the frontend faster, and database size reduction ranging from 10-35% for most installs.
When the same set of state attributes already exists in the database, newly recorded state changes now only store a reference to the state attributes instead of saving the whole set of state attributes.
As data stored in the database before 2022.4 does not get updated, the reduction in database size will occur over time with the recording of new data and the purging of old data.
Most entity history graphs no longer need to fetch the attributes from the backend, and as a result, they load much faster after 24 hours of running on 2022.4. This change is especially noticeable for energy sensors.
For a few a complete set of changes to the underlying database, check out the Data Science Portal
Hm, this counts for the inbuilt db I presume. Or also for influx in any way?
Nothing to do with influxdb
YES! This is awesome!
Great news !
I assume this is done by using the new no_attributes
option to the API ?
Yes, that is correct
Perfect, thanks, will update my history explorer card code in anticipation of the update
The following domains still need attributes:
The sensor entities do not though which are where most state changes come from.