Large statistics database (1,9G, 40% of total DB)

And I do not remember what year LTS was incorporated. Assume it was 2021, and LTS is kept for 1000 days.
Then we have 0,5Kbyte * 1000 = 0.5 Mbyte approx. For 1000 days.
How many entities are needed to occupy 1GB - it is 2000 sensors approx.
Well, may be someone really have such a number of sensors with state_class))))

There will be other factors like database table indexing as overhead but it won’t be much.

Here’s my database for the last two years:

It’s a bit difficult to actually plot a trend due to me adding and removing things all the time and the optimisations that have occurred. Also this includes event data. I keep 7 days and exclude a lot of stuff I don’t need from the recorder and have about 750 sensors.

alarm_control_panel: 1
automation: 299
binary_sensor: 231
button: 168
calendar: 4
camera: 14
climate: 3
conversation: 1
device_tracker: 2
event: 2
fan: 3
group: 1
image: 5
input_boolean: 18
input_datetime: 12
input_number: 30
input_select: 19
input_text: 1
light: 62
lock: 11
media_player: 15
number: 86
person: 1
remote: 4
schedule: 7
script: 246
select: 92
sensor: 756
stt: 1
sun: 1
switch: 112
tag: 7
todo: 5
tts: 1
update: 96
vacuum: 2
weather: 2
zone: 6

My DB is about 1.5 GB with about 3500 entities with 30 days.
It used to be larger - up to 10GB - and it happened several times only because of “sys_error_log”. Then I excluded this event.

I have about 1190 entities covered by the LTS.
(Or at least unique metadata_id’s)
1.9GB (for only the statistics table) is a fact, so that is about 1,5MB per item. (doesn’t make sense)

(Total DB size = 4,5. No issues with auto purging “states” and “short_term stats”, understand that part. Cleaning up entities will reduce size eventually)

SELECT 
  statistics.metadata_id,
  statistics_meta.id,
  statistics_meta.statistic_id, 
  COUNT(*)
FROM statistics, statistics_meta
WHERE statistics.metadata_id = statistics_meta.id
GROUP BY
    statistics_meta.statistic_id
ORDER BY
    COUNT(*) DESC;

The ones with highest number of entries have 24478 records, about 1020 days.
(So, nothing strange, right?).

About data calculations: There are are a created_ts and start_ts for every record (2 more double;s).
Still, even if all where 300kB/per year, (some will be less, total device class), and for sure not all exist for the full three years.
I should have 1200 (entities) X 300kB x 3 yeas = 1 GB (Still not nothing, but okay) How is it still 1.9GB in size then?

I expect many old records that are never removed.
For example, I’m now started cleaning, disabling entities like a temperature signal reported by a door sensor (which is not accurate, so useless). Currently this value is also in LTS.
(Checked and I’m able to retrieve value from the whole year).
When I disable the entity, it’s immediately gone from the list. Developer Tools > Statistics >
There is no repair/fix, it’s just gone.
However, even after reboot, when I search directly in the statistics tables, I still find all ¬20K+ records.

On my quest to reduce DB size, another question:

I excluded some useless values from the recorder.
(I did not disable them, as the current value is still useful)

entity_globs:
  - sensor.*_battery
  - sensor.*_signal_strength

However, these lead to two different results…???

This entity is excluded from being recorded:
image

Which is true, no “fix”. Will be here as a reminder.

However, for the battery:
image

Now, there is a “fix” to remove the old values.

Why is this different???

Not sure, just a speculation:
Perhaps the 1st entity was excluded BEFORE any LTS data were stored?

Not sure… you might be right.
After LTS was introduced more and more device classes where added (afaik) to be compatible.
But both signal signal_strength and battery I excluded recently.

I first believed it had anything to do with different entity types (like the ones are marked as “diagnostic”. But that’s not the case. I removed what I can by “fix”.

But okay, now I’m stuck with many “This entity is excluded from being recorded.”, most of the are _battery.

Success!

I now have ZERO records in my statistics, statistics_short_term and statistics_meta tables. So far I’ve seen no negative consequences to any of my dashboards, displays, automations or anything else I can think of.

The way I did this was to add each entity to customize.yaml with a state_class: none line. For example:

sensor.visonic_mct_340_e_battery:
  state_class: none

After restarting HA, I had to go to Developer Tools / Statistics, click on each “Fix” link, then select “Delete.”

Oh, and for the record, my database is now just 35% of the size it was before I did all this.

To me, the idea of forcing “forever” LTS data retention on everyone, and for every entity associated with certain state_class values, is poor design. Retention requirements can vary greatly between different kinds of entities, and different use cases.

1 Like

Other than you can only see back as far as your purge_keep_days setting.

There’s multi-select for that now.

No one is forcing you. As advised 18 days ago:

…Which is exactly what I wanted.

I saw that in the release notes. Thank you for the reminder. Some people might want to update to 2024.10.x before trying this.

If you were to actually read my thread, you’ll see that’s exactly what I did.

Please understand, this thread is about different ways to deal with a large amount of LTS data in the database. Mine isn’t the only way. But neither is yours. I only offered my experience for those who might be helped by it.

I never asked for LTS, and never asked that every single entity be retained. This was indeed “forced” on users, and this work-around is hardly intuitive. The great thing about HA is that we can all use it in different ways. That shouldn’t be discouraged.

2 Likes

I did read it. You presented it as something you discovered rather than something you were told about over two weeks ago.

Project decisions like this are for the majority. They will keep happening. Fortunately in this case there was something you could do but do not be surprised if they keep happening. The developers have said many times they are not interested in supporting making everything configurable.

1 Like

FYI that work around will likely produce repairs for every entity in 2024.10+ 1 to 4 hours after each restart.

1 Like

I observe these repairs only for a few entities (belonging to Traccar Server integration which is broken) out of ~30…40 entities with customized “state-class: none”.

1 Like

Thank you for the heads-up. I’ll be watching for that when I update in a few days.

If anyone has a better way to disable LTS recording, either by entity or altogether, feel free to chime in!

Hi, please check this feature request for more options to trim LTS.
(and the issue of different options). Especially disabled entities are an issue as those will remain in LTS forever. (Pls vote :wink: )

1 Like

Just to follow up on this, it’s been about 8 hours now since I restarted, and so far nothing in Repairs. I’ll keep looking, but at this point the workaround seems to be exactly what I wanted.

See my post above, a only HAD “repairs” with sensors with state_class=none of some particular integration (out of many other sensors with customized state_class=none). Since some update there are no repairs for statistics at all.

The 2024.11 update also came with the “fix”/feature to easy remove statistics for disabled entities. (or every entity without states :slight_smile: This finally provides an option to clear up old/removed/changed and disabled entities. I had MANY, hundreds and hundreds… but after some cleaning if paid off :slight_smile:

From 13M+ records to ~6M, size reduces to 900MB.
Still twice as much as the states table and more than 50% of the total 1,7GiB DB size.
Significant, but okay, this is manageable.

1 Like

Thank you for sharing that. It is consistent with what I’ve seen. And inconsistent with the comments dismissing the size of the statistics tables as irrelevant.

And, it’s great to see the ability to at least clean up the disabled or orphaned records.

Yeah, you’re right. It is definitely not irrelevant and not sustainable on the long run. Especially not for larger installations. An opt-in would maybe a better approach.

1 Like