Database size sensor?

First question should be: are you using default SQLite database ot MariaDB plugin?

I wonder what would be the point (usability) of this sensor, since size of the database is always the same after it reaches defined time (default 10 days with auto-purge). I have db on mariaDB, set to 30 days and it’s always something over 4GB.

I’m not seeing that. I use the default SQLite database. I posted a graph here showing how it keeps increasing. I believe this is due to data continually being added to the statistics table, which are kept forever.

It keeps growing only if you defined it that way. As said, by default recorder is set to 10 days with auto purge, which keeps it pretty much constant. When i’ve had DB on sqlite (default location) and set to default it’s size was around 2GB (almost one year) and it grew a tiny bit only when i added sensors.
(Unless i’m missing something…)

Long term statistics is very efficient, only a few permanent table entries for each sensor that get updated (rather than adding new time stamped entries). It will not continue to make the database grow in size.

I don’t recall ever defining it that way. My purge_keep_days is set to 4. Can you explain what I can do to un-define whatever I did?

That’s not what I saw. I deleted around 70,000 records dating back from last July. There’s some discussion here and here and over on GitHub which seem to point to the fact that quite a lot of data are stored and never purged.

I would love to be proven wrong, this is just what I’m reading and seeing in my own database.

I see a very flat database size once the purge interval is reached (7 days in my case):

The dip to zero was me moving to new host hardware.You can sea the DB size increases over 7 days then remains fairly constant. Any slight increase can easily be accounted for by me adding new sensors.

Thank you for posting that graph. It’s certainly different from what I’m seeing. I’ll keep poking around.

Can you verifiy whether or not you have a statistics table, and if so, can you do a quick record count, then check again in a few days? I can’t seem to track down exactly how this got created, and whether or not it’s something I installed inadvertently or everyone has it. It’s definitely growing daily on my installation.

I do.

If you give me the SQL query. It’s not my area of expertise.

EDIT:


Showing rows 0 - 24 (190356 total, Query took 0.0006 seconds.)

`SELECT * FROM `statistics``

Thank you! You could also do SELECT COUNT(*) FROM statistics; but either way works.

I just ran that command on a copy of the database from two days ago, and again from today’s.

The old one had 2,660 records, today’s has 4,080 records. So yes, mine is still growing.

This isn’t huge, and I have no problem manually purging them every so often. But I’m not the typical new user. If they take the defaults, on the recommended hardware, and start adding entities, they’ll eventually kill their system.

For anyone who’s interested, here’s the SQL I use to purge the statistics table of anything older than four days ago (the same purge_keep_days I have set for Recorder):
DELETE FROM statistics WHERE start < DATE('now','-4 day');

I’ve yet to see any negative impact to this. As far as I know, I don’t use Statistics data anywhere. I’m pretty sure they are needed by the energy management stuff, which I don’t currently use. The usual disclaimer applies: Don’t delete any data unless you know how it will impact you!

11 hours later:

Showing rows 0 - 24 (192826 total, Query took 0.0003 seconds.)

`SELECT * FROM `statistics``

That’s an increase of 2470.

So ~244 rows per hour.

Here are my findings:
as said, my recorder is set to 35 days. So, size of whole DB is 4.4GB. Biggest two tables are “states”: 2.6GB and “events”: 1.7GB. Others are minor sizes.
Interesting is that only in “statistics” entries are still from the date i created my DB: december 28th 2021. In all other tables last entry is January 11th (35 days ago). But statistics table is only 16MB in size, so it’s really not worth even mentioning.

And, my 4.4GB is pretty constant in size, it doesn’t increase (much).
Since in picture some text is in Slovenian: “vrstic” means “lines”, “Velikost” means size:


I’ll check again tomorrow.

You can try to understand the content of the database using this queries:

Select Count(*) As cnt, event_type From events Group By event_type Order by cnt Desc Limit 10;

→ it will give you a table with number of datasets of each event type in events table

Select  Count(statistics.id) As cnt, statistics_meta.statistic_id
From  statistics Inner Join  statistics_meta On statistics.metadata_id = statistics_meta.id
Group By  statistics_meta.statistic_id
Order By cnt Desc Limit 20;

→ this will show you the number of entries in statistics database per entity (top 20)

Select statistics_meta.statistic_id As cnt,  Count(statistics_short_term.id) As Count_id
From  statistics_meta Inner Join statistics_short_term On statistics_short_term.metadata_id = statistics_meta.id
Group By statistics_meta.statistic_id
Order By cnt DESC  Limit 20;

→ this will show you the number of entries in statistics_short_term database per entity (top 20)

Select Count(*) As cnt, entity_id  From states Group By entity_id Order by cnt Desc Limit 20;

→ this will show you the number of entries in main states database per entity.

So - maybe you can find out what entities you could exclude from recorder (to save storage)
Not every thing must be in statistics :wink:

2 Likes

Thank you both for confirming what I’m seeing. Obviously no-one with robust hardware and a non-standard database needs to worry about this relatively minor increase.

Still, anyone running HA on a RPi with SD card should probably take notice.

Additional for checking the used table space:

For SQLite:

Select name, sum(pgsize)/1024/1024 as size From dbstat Group by name Order By size Desc Limit 10;

For MariaDB (all databases and tables):

Select table_schema As DB_NAME, TABLE_NAME, (DATA_LENGTH+INDEX_LENGTH)/1024/1024 As TABLE_SIZE_in_MB From information_schema.TABLES;;
1 Like

Correct. No entities I’ve excluded from Recorder (to save storage) show up in the statistics_meta table.

True. But, there are some I haven’t excluded, because I want those in the events and states tables, but only for the duration I set in purge_keep_days.

Unfortunately, data from all these non-excluded entities are automatically saved, apparently forever, in the statistics table.

If you see (running the queries above) any “problem” - I mean entities with a high volume in statistics and statistics_short_term you could make a job and delete them from these tables, maybe once in a day.

But for statistics it should not be a problem (once per day one value only) - in statistics_short_term you have every 5 minutes a value - but also only for the purge_keep_days time…

how you make sensor for mariadb thx

@milandzuris
You can try to use sql_json via HACS.

Here you can build sensors using a SQL query.

im using maria