I was analyzing the size of my Sqlite database, and trying to figure out what was the main consumer of space.
SELECT sum(pgsize - unused) / 1024 / 1024 MB_USED,
sum(pgsize) / 1024 / 1024 MB_ALLOC,
name
FROM dbstat
GROUP BY name
ORDER BY 1 DESC;
MB_USED MB_ALLOC name
419 447 state_attributes
273 275 states
105 105 ix_states_context_id_bin
82 82 ix_states_metadata_id_last_updated_ts
72 73 ix_states_last_updated_ts
56 56 ix_states_old_state_id
53 53 ix_states_attributes_id
49 49 statistics
40 40 ix_states_event_id
29 29 statistics_short_term
16 16 ix_statistics_statistic_id_start_ts
16 16 ix_state_attributes_hash
13 13 ix_statistics_start_ts
10 10 ix_statistics_short_term_statistic_id_start_ts
8 8 ix_statistics_short_term_start_ts
0 0 statistics_runs
In my case I have a bunch of sensors that update very frequent, so the size of states is normal.
But the size of state_attributes is not.
I drilled down, and found that the culprit was all my utility meter sensors:
SELECT count( * ),
'COLLECTING_ENERGY' type
FROM state_attributes
WHERE shared_attrs LIKE '%"status":"collecting"%"device_class":"energy"%'
UNION ALL
SELECT count( * ),
'OTHER' type
FROM state_attributes
WHERE shared_attrs NOT LIKE '%"status":"collecting"%"device_class":"energy"%';
count(*) | type |
---|---|
1112722 | COLLECTING_ENERGY |
21486 | OTHER |
SELECT entity_id
FROM states_meta
WHERE metadata_id IN (
SELECT DISTINCT metadata_id
FROM states
WHERE attributes_id IN (
SELECT DISTINCT attributes_id
FROM state_attributes
WHERE shared_attrs LIKE '%"status":"collecting"%"device_class":"energy"%'
)
);
select * from state_attributes where shared_attrs like '%"status":"collecting"%"device_class":"energy"%';
For example, a utility meter of my hot water boiler, I have 3 for this appliance (day, month, forever), is generating a lot of records, 20k records in the period of 7 days (I have purge_keep_days: 7 in recorder configuration)
select * from state_attributes where shared_attrs like '%"friendly_name":"Cilindro diário"}%'
{"state_class":"total_increasing","source":"sensor.cilindro_switch_0_energy","status":"collecting","last_period":"8.015380","last_valid_state":"0.005664","meter_period":"daily","cron pattern":"0 0 * * *","last_reset":"2023-05-18T23:00:00.006326+00:00","unit_of_measurement":"kWh","device_class":"energy","icon":"mdi:counter","friendly_name":"Cilindro diário"}
{"state_class":"total_increasing","source":"sensor.cilindro_switch_0_energy","status":"collecting","last_period":"6.785448","last_valid_state":"0.024666","meter_period":"daily","cron pattern":"0 0 * * *","last_reset":"2023-05-24T23:00:00.019142+00:00","unit_of_measurement":"kWh","device_class":"energy","icon":"mdi:counter","friendly_name":"Cilindro diário"}
{"state_class":"total_increasing","source":"sensor.cilindro_switch_0_energy","status":"collecting","last_period":"9.267504","last_valid_state":"9.235814","meter_period":"daily","cron pattern":"0 0 * * *","last_reset":"2023-05-23T23:00:00.014354+00:00","unit_of_measurement":"kWh","device_class":"energy","icon":"mdi:counter","friendly_name":"Cilindro diário"}
{"state_class":"total_increasing","source":"sensor.cilindro_switch_0_energy","status":"collecting","last_period":"8.401274","last_valid_state":"0.009313","meter_period":"daily","cron pattern":"0 0 * * *","last_reset":"2023-05-21T23:00:00.018660+00:00","unit_of_measurement":"kWh","device_class":"energy","icon":"mdi:counter","friendly_name":"Cilindro diário"}
{"state_class":"total_increasing","source":"sensor.cilindro_switch_0_energy","status":"collecting","last_period":"6.573446","last_valid_state":"6.517286","meter_period":"daily","cron pattern":"0 0 * * *","last_reset":"2023-05-22T23:00:00.087744+00:00","unit_of_measurement":"kWh","device_class":"energy","icon":"mdi:counter","friendly_name":"Cilindro diário"}
{"state_class":"total_increasing","source":"sensor.cilindro_switch_0_energy","status":"collecting","last_period":"9.267504","last_valid_state":"0.014591","meter_period":"daily","cron pattern":"0 0 * * *","last_reset":"2023-05-23T23:00:00.014354+00:00","unit_of_measurement":"kWh","device_class":"energy","icon":"mdi:counter","friendly_name":"Cilindro diário"}
{"state_class":"total_increasing","source":"sensor.cilindro_switch_0_energy","status":"collecting","last_period":"9.267504","last_valid_state":"9.237438","meter_period":"daily","cron pattern":"0 0 * * *","last_reset":"2023-05-23T23:00:00.014354+00:00","unit_of_measurement":"kWh","device_class":"energy","icon":"mdi:counter","friendly_name":"Cilindro diário"}
{"state_class":"total_increasing","source":"sensor.cilindro_switch_0_energy","status":"collecting","last_period":"6.785448","last_valid_state":"0.02052","meter_period":"daily","cron pattern":"0 0 * * *","last_reset":"2023-05-24T23:00:00.019142+00:00","unit_of_measurement":"kWh","device_class":"energy","icon":"mdi:counter","friendly_name":"Cilindro diário"}
{"state_class":"total_increasing","source":"sensor.cilindro_switch_0_energy","status":"collecting","last_period":"6.573446","last_valid_state":"6.518963","meter_period":"daily","cron pattern":"0 0 * * *","last_reset":"2023-05-22T23:00:00.087744+00:00","unit_of_measurement":"kWh","device_class":"energy","icon":"mdi:counter","friendly_name":"Cilindro diário"}
{"state_class":"total_increasing","source":"sensor.cilindro_switch_0_energy","status":"collecting","last_period":"8.401274","last_valid_state":"2.720321","meter_period":"daily","cron pattern":"0 0 * * *","last_reset":"2023-05-21T23:00:00.018660+00:00","unit_of_measurement":"kWh","device_class":"energy","icon":"mdi:counter","friendly_name":"Cilindro diário"}
{"state_class":"total_increasing","source":"sensor.cilindro_switch_0_energy","status":"collecting","last_period":"8.748101","last_valid_state":"10.211658","meter_period":"daily","cron pattern":"0 0 * * *","last_reset":"2023-05-19T23:00:00.043037+00:00","unit_of_measurement":"kWh","device_class":"energy","icon":"mdi:counter","friendly_name":"Cilindro diário"}
Is it really necessary to save all this redundant information, I think, we only need the most recent value in this table, since the value itself is in the STATES table.
Instead of an insert maybe an update, if not for every type, at least for utility meters? Or am I missing something from workflow / architecture of Home Assistant.