Cleaning up HA Database (with some old_state_id problems)

I’ve had HA running for some weeks now and with trial and error & help from this community it has started to work rather well.

After a while I noticed was much increased home-assistant_v2.db (as was mentioned in docs as well) and I also managed to corrupt it few times. So I went with postgresql as recorder backend.

This was rather painless and there was a step by step guide to it in Migrate Home Assistant from SQLite to PostgreSQL | Sigfried Seldeslachts and with few mods it was up and running.

Next I wanted to make sure that I’d have a way to keep database bit smaller and janitor it regularly (myself)

First step would be to remove all entries where state just does not change.

some example data would be

 state_id |            entity_id            | state | attributes | event_id | last_changed |  last_changed_ts  | last_updated |  last_updated_ts  | old_state_id | attributes_id |         context_id         | context_user_id | context_parent_id | origin_idx
----------+---------------------------------+-------+------------+----------+--------------+-------------------+--------------+-------------------+--------------+---------------+----------------------------+-----------------+-------------------+------------
   308719 | sensor.ilmasensori1_temperature | 23.6  |            |          |              |                   |              | 1675588279.821811 |       308713 |        293077 | 01GRGDG7GD54CB3933XGWDRYYJ |                 |                   |          0
   308725 | sensor.ilmasensori1_temperature | 23.6  |            |          |              | 1675588279.821811 |              | 1675588279.861426 |       308719 |        293083 | 01GRGDG7HNWN1Q49E3KGENFHRE |                 |                   |          0
   308731 | sensor.ilmasensori1_temperature | 23.6  |            |          |              | 1675588279.821811 |              | 1675588280.722815 |       308725 |        293089 | 01GRGDG8CJ7KVZMRPA79S37S0D |                 |                   |          0
   308737 | sensor.ilmasensori1_temperature | 23.6  |            |          |              | 1675588279.821811 |              | 1675588280.762024 |       308731 |        293095 | 01GRGDG8DTC0D738CFK7S4QCK8 |                 |                   |          0
   308743 | sensor.ilmasensori1_temperature | 23.6  |            |          |              | 1675588279.821811 |              | 1675588280.800759 |       308737 |        293101 | 01GRGDG8F0Y6BAQ5MCPENH9MWX |                 |                   |          0
   308749 | sensor.ilmasensori1_temperature | 23.6  |            |          |              | 1675588279.821811 |              | 1675588280.838888 |       308743 |        293107 | 01GRGDG8G6QK2ARH8H2T2PTM1D |                 |                   |          0
   308755 | sensor.ilmasensori1_temperature | 23.6  |            |          |              | 1675588279.821811 |              | 1675588287.316702 |       308749 |        293113 | 01GRGDGETM8RMD0Y2KF17NQGJ6 |                 |                   |          0
   308761 | sensor.ilmasensori1_temperature | 23.6  |            |          |              | 1675588279.821811 |              | 1675588287.356455 |       308755 |        293119 | 01GRGDGEVW4CEY76N66NSE7PGJ |                 |                   |          0
   308767 | sensor.ilmasensori1_temperature | 23.6  |            |          |              | 1675588279.821811 |              |  1675588287.71043 |       308761 |        293125 | 01GRGDGF6Y56RYG6MCDXQFVKDZ |                 |                   |          0
   308773 | sensor.ilmasensori1_temperature | 23.6  |            |          |              | 1675588279.821811 |              | 1675588287.748196 |       308767 |        293131 | 01GRGDGF8450Y2YVYEREQ285TJ |                 |                   |          0
   308779 | sensor.ilmasensori1_temperature | 23.6  |            |          |              | 1675588279.821811 |              | 1675588287.786542 |       308773 |        293137 | 01GRGDGF9A7W9GTFRECWZV7KYZ |                 |                   |          0
   308785 | sensor.ilmasensori1_temperature | 23.6  |            |          |              | 1675588279.821811 |              | 1675588287.826042 |       308779 |        293143 | 01GRGDGFAJJ9ZEHMBVYP08EBAS |                 |                   |          0
   308791 | sensor.ilmasensori1_temperature | 23.6  |            |          |              | 1675588279.821811 |              | 1675588291.204033 |       308785 |        293149 | 01GRGDGJM45JB25P9QDRSP5VJN |                 |                   |          0
   308797 | sensor.ilmasensori1_temperature | 23.6  |            |          |              | 1675588279.821811 |              | 1675588291.242631 |       308791 |        293155 | 01GRGDGJNAAV4H5TYK2ZD4KYHH |                 |                   |          0
   308803 | sensor.ilmasensori1_temperature | 23.6  |            |          |              | 1675588279.821811 |              | 1675588291.281162 |       308797 |        293161 | 01GRGDGJPHB8XNJ0K0RM1Q4B3Y |                 |                   |          0
   308809 | sensor.ilmasensori1_temperature | 23.6  |            |          |              | 1675588279.821811 |              |  1675588291.31921 |       308803 |        293167 | 01GRGDGJQQ49XBSEQ58PV4716P |                 |                   |          0
   308815 | sensor.ilmasensori1_temperature | 23.9  |            |          |              |                   |              | 1675588292.313652 |       308809 |        293173 | 01GRGDGKPS5NACP9SYT28RZDX5 |                 |                   |          0

In some cases timestamp is updated fairly often (perhaps sensor should be adjusted in zigbee2mqtt to report less often?)

Here I would not have any use for the entries 308725 → 308809 and hence could drop them out.

I did however come across the problem of ‘states_old_state_id_fkey’ CONSTRAINT.
It would not let me remove any entries where there was reference from another entry to here and
it appears that all entries in database are ‘chained’ together through state_ids.

So I dug into homeassistant_core and found from recorder purge.py and db_schema.py and queries.py

I noticed that ‘purge’ sets old_state_id to NULL so it can proceed with purging rows.
I tried that in a way that I set old_state_id to NULL for all entries that had last_changed_ts (WHERE last_changed_ts IS NOT NULL;) since those seem to be rows that I have no use for.

However removal did not work after that since still there was a row that I wanted to keep which was still referring to one of those removable rows.

So I looked more to the code and it seemed that old_state_id reference is not used in anything (important?) I removed the foreign key CONSTRAINT and hence was able to remove whatever rows I wanted.

Things seem to work as before. Nothing ‘seems’ to be broken (yet).

While I understand that this constraint could be important the removal of it and cleaning data does not seem to affect my usecases (at least yet)

So I am asking

  1. Is the constraint just something that is set up my sqlachemy?
  2. Is it used by any functionality that I did not notice
  3. If no current, then what future uses might break because of this?

Another goal of this would be to make historical data more sparse.

ie. have one entry per minute for things that are older than 1 month and one entry per 10 seconds for entries that are older than 1 week.

Perhaps influxdb would be better choice?

Over a period of time it seems DB had grown up to 5GB in size which is not that much, but gave me opportunity to see what bigger cleanup might do.

I ran
DELETE FROM states WHERE last_changed_ts IS NOT NULL;
to remove all states which were identical to previous state. This reduced tables by several millions of rows.

After that I removed all state_attributes which were no longer references into

DELETE FROM state_attributes
WHERE attributes_id IN (
   SELECT attrs.attributes_id
     FROM state_attributes attrs
   LEFT    JOIN    states  sts
     ON sts.attributes_id = attrs.attributes_id
   WHERE sts.attributes_id IS NULL
);

after VACUUM; and VACUUM FULL; the size of the database on disk dropped down to 1.5G

After that I tried to all entries except every 100th for my temperature sensor which goes up by 0.1 and down by 0.1 almost all the time.
It would be useful to be able to set range in which changes would be ignored so changes like 22.1, 22.2, 22.1, 22.2, 22.1 would not result in logged entry only after going to 22.0 or 22.3 (if 0.1 range is specified) something would be logged. (maybe this feature would belong into zigbee2mqtt)

So running

DELETE FROM states
WHERE state_id IN (
  SELECT state_id
  FROM (
    SELECT state_id, ROW_NUMBER() OVER (ORDER BY state_id ASC) AS rown
    FROM states
    WHERE entity_id LIKE 'sensor.airsensor1_temperature'
    ) s
  where s.rown%100 != 0);

Cleaned up some 120k entries more left some 1k entries and resulted in historical data page opening up way faster

I understand that this is most likely NOT the right way to keep database small(ish) but so far it does not seem to have ill effects on the usability of the data in ‘history’ page.

1 Like

Thank you, you save me so much space!