Large statistics database (1,9G, 40% of total DB)

You can probably use customize to set the state_class to none.

I cannot believe that LTS can be 1.9 GB.
How many sensors with state_class do you have?
And how many months does your LTS include?
Having these two values, we can estimate a possible size.

Once the 5 minute data is purged there is only hourly data. That’s 24 samples per day. The samples contain one each of average, min and max. Assuming double precision floating point (64bit):

24 x 3 x 64 = 4.608kb per day per entity, or 0.576kB per day per entity, kept forever. This equates to 205kB per year per entity i.e. bugger all.

For the 5 minute data (only kept for seven days) it is 288 times this amount, or ~166kB per entity per day. Kept for 7 days that’s about 1MB per entity for the whole week. Still not a lot.

That’s for the measurement state class. For the total device classes only a count is kept. So 3 times less.

EDIT: fixed math

1 Like

Are you sure?
64 bit = 8 byte, so it is 24 * 3 * 8=576 byte. About 0,5 Kbyte. Per entity. Per day.

Hah yeah. Still drinking my coffee. :man_facepalming:

Will update.

And I do not remember what year LTS was incorporated. Assume it was 2021, and LTS is kept for 1000 days.
Then we have 0,5Kbyte * 1000 = 0.5 Mbyte approx. For 1000 days.
How many entities are needed to occupy 1GB - it is 2000 sensors approx.
Well, may be someone really have such a number of sensors with state_class))))

There will be other factors like database table indexing as overhead but it won’t be much.

Here’s my database for the last two years:

It’s a bit difficult to actually plot a trend due to me adding and removing things all the time and the optimisations that have occurred. Also this includes event data. I keep 7 days and exclude a lot of stuff I don’t need from the recorder and have about 750 sensors.

alarm_control_panel: 1
automation: 299
binary_sensor: 231
button: 168
calendar: 4
camera: 14
climate: 3
conversation: 1
device_tracker: 2
event: 2
fan: 3
group: 1
image: 5
input_boolean: 18
input_datetime: 12
input_number: 30
input_select: 19
input_text: 1
light: 62
lock: 11
media_player: 15
number: 86
person: 1
remote: 4
schedule: 7
script: 246
select: 92
sensor: 756
stt: 1
sun: 1
switch: 112
tag: 7
todo: 5
tts: 1
update: 96
vacuum: 2
weather: 2
zone: 6

My DB is about 1.5 GB with about 3500 entities with 30 days.
It used to be larger - up to 10GB - and it happened several times only because of “sys_error_log”. Then I excluded this event.

I have about 1190 entities covered by the LTS.
(Or at least unique metadata_id’s)
1.9GB (for only the statistics table) is a fact, so that is about 1,5MB per item. (doesn’t make sense)

(Total DB size = 4,5. No issues with auto purging “states” and “short_term stats”, understand that part. Cleaning up entities will reduce size eventually)

SELECT 
  statistics.metadata_id,
  statistics_meta.id,
  statistics_meta.statistic_id, 
  COUNT(*)
FROM statistics, statistics_meta
WHERE statistics.metadata_id = statistics_meta.id
GROUP BY
    statistics_meta.statistic_id
ORDER BY
    COUNT(*) DESC;

The ones with highest number of entries have 24478 records, about 1020 days.
(So, nothing strange, right?).

About data calculations: There are are a created_ts and start_ts for every record (2 more double;s).
Still, even if all where 300kB/per year, (some will be less, total device class), and for sure not all exist for the full three years.
I should have 1200 (entities) X 300kB x 3 yeas = 1 GB (Still not nothing, but okay) How is it still 1.9GB in size then?

I expect many old records that are never removed.
For example, I’m now started cleaning, disabling entities like a temperature signal reported by a door sensor (which is not accurate, so useless). Currently this value is also in LTS.
(Checked and I’m able to retrieve value from the whole year).
When I disable the entity, it’s immediately gone from the list. Developer Tools > Statistics >
There is no repair/fix, it’s just gone.
However, even after reboot, when I search directly in the statistics tables, I still find all ¬20K+ records.

On my quest to reduce DB size, another question:

I excluded some useless values from the recorder.
(I did not disable them, as the current value is still useful)

entity_globs:
  - sensor.*_battery
  - sensor.*_signal_strength

However, these lead to two different results…???

This entity is excluded from being recorded:
image

Which is true, no “fix”. Will be here as a reminder.

However, for the battery:
image

Now, there is a “fix” to remove the old values.

Why is this different???

Not sure, just a speculation:
Perhaps the 1st entity was excluded BEFORE any LTS data were stored?

Not sure… you might be right.
After LTS was introduced more and more device classes where added (afaik) to be compatible.
But both signal signal_strength and battery I excluded recently.

I first believed it had anything to do with different entity types (like the ones are marked as “diagnostic”. But that’s not the case. I removed what I can by “fix”.

But okay, now I’m stuck with many “This entity is excluded from being recorded.”, most of the are _battery.

Success!

I now have ZERO records in my statistics, statistics_short_term and statistics_meta tables. So far I’ve seen no negative consequences to any of my dashboards, displays, automations or anything else I can think of.

The way I did this was to add each entity to customize.yaml with a state_class: none line. For example:

sensor.visonic_mct_340_e_battery:
  state_class: none

After restarting HA, I had to go to Developer Tools / Statistics, click on each “Fix” link, then select “Delete.”

Oh, and for the record, my database is now just 35% of the size it was before I did all this.

To me, the idea of forcing “forever” LTS data retention on everyone, and for every entity associated with certain state_class values, is poor design. Retention requirements can vary greatly between different kinds of entities, and different use cases.

Other than you can only see back as far as your purge_keep_days setting.

There’s multi-select for that now.

No one is forcing you. As advised 18 days ago:

…Which is exactly what I wanted.

I saw that in the release notes. Thank you for the reminder. Some people might want to update to 2024.10.x before trying this.

If you were to actually read my thread, you’ll see that’s exactly what I did.

Please understand, this thread is about different ways to deal with a large amount of LTS data in the database. Mine isn’t the only way. But neither is yours. I only offered my experience for those who might be helped by it.

I never asked for LTS, and never asked that every single entity be retained. This was indeed “forced” on users, and this work-around is hardly intuitive. The great thing about HA is that we can all use it in different ways. That shouldn’t be discouraged.

1 Like

I did read it. You presented it as something you discovered rather than something you were told about over two weeks ago.

Project decisions like this are for the majority. They will keep happening. Fortunately in this case there was something you could do but do not be surprised if they keep happening. The developers have said many times they are not interested in supporting making everything configurable.

1 Like

FYI that work around will likely produce repairs for every entity in 2024.10+ 1 to 4 hours after each restart.

1 Like

I observe these repairs only for a few entities (belonging to Traccar Server integration which is broken) out of ~30…40 entities with customized “state-class: none”.

1 Like

Thank you for the heads-up. I’ll be watching for that when I update in a few days.

If anyone has a better way to disable LTS recording, either by entity or altogether, feel free to chime in!

Hi, please check this feature request for more options to trim LTS.
(and the issue of different options). Especially disabled entities are an issue as those will remain in LTS forever. (Pls vote :wink: )