MariaDB growing too big... Can't get it smaller

TheStigh · March 8, 2020, 8:26pm

Using MariaDB and the database has now grown passing 9 Gb.
I’ve added some exludes to recoder:

recorder:
  db_url: !secret urlMariaDB
  purge_keep_days: 14
  exclude:
    domains:
      - weblink
      - updater
      - input_boolean
      - input_number
      - input_select
      - input_text
      - light
      - media_player
      - sun
      - timer
      - weather
      - camera
    entities:
... a bunch of entities also...

Also, I’ve got an automation to purge the database, but still it doesn’t reduce it’s size.

- alias: Purge Database
  id: Purge Database
  trigger:
  - platform: time
    at: '03:00:00'
  action:
  - service: recorder.purge
    data:
      keep_days: 14

What am I missing?

anon43302295 · March 8, 2020, 9:15pm

Maybe try a lower number of keep_days - 14 is a lot!

If you need long term data for certain things, then export them to a separate database (influx or some such). In general 3 - 5 days is enough for logbook and history usage.

BertrumUK · March 8, 2020, 9:23pm

Also try adding repack: true to the action

  alias: Purge Database
  initial_state: True
  trigger:
    platform: numeric_state
    entity_id: sensor.mariadb_pi4_size
    above: 1000
  action:
    service: recorder.purge
    data:
      keep_days: 2
      repack: true

VDRainer · March 8, 2020, 9:50pm

There’s no repack option for mysql db.
I’m not sure, but i think mysql doesn’t release the space, even after purging the db.

If you have a sql client you can try to

optimize table events

and

optimize table states

This will recreate the tables and release space.

lmamakos · March 8, 2020, 10:05pm

Fire up a MySQL/MariaDB client and try something like

select entity_id,count(*) cnt from states group by entity_id order by cnt desc limit 20;

to see what entities are generating lots of events.

squirtbrnr · March 8, 2020, 10:08pm

the automatic purge function will not run until after a HA uptime of over 24 hours. If you restart HA before this point, it also restarts the purge timer. I don’t have specifics or exactly when, I just know it doesn’t purge until after the 24 hours mark.

I use MariaDB in a separate docker container install and I link HA Core to it. I have no issues and my DB hovers around 2.8GB. Loading times for Logbook and History are less than 20-30 seconds. My config is:

recorder:
  db_url: !secret mysql_db
  purge_interval: 1
  purge_keep_days: 28
  exclude:
#    <long list of domains and entities>

TheStigh · March 8, 2020, 10:26pm

Repack is only available for SQLite

TheStigh · March 8, 2020, 10:27pm

Thanx bud, I will wait for 24 hrs now and see what happens.
May I ask how many sensors you do have logged?

DavidFW1960 · March 8, 2020, 10:49pm

Yep. I have an automation that runs the purge every day at midnight because of that.

squirtbrnr · March 9, 2020, 6:18pm

quite a few, but I don’t have an official count. Some of the biggest ones I have that I know of that log to the database are:

Dark Sky weather forecast info
Person/Device Trackers
about 30 ZWave devices and all of their attributes. 4 of them are smart switches that log power, current, voltage, etc (about 6 sensors total) once every 1-2 minutes (on a motion sensor or door sensor, that’s about 8-10 individual sensors, on a switch or light maybe 1-2)

TheStigh · March 9, 2020, 6:28pm

HAPPY ME

Thank you @squirtbrnr !!!

TheStigh · March 9, 2020, 6:29pm

I wonder what is actually the best database to use…
What is the fastest and most effective Db to use?

squirtbrnr · March 9, 2020, 11:14pm

That’s a very subjective question. Honestly it depends on your use case and what features or performance you are looking for. Personally I already had MariaDB running for my Kodi media player library so naturally, I just used MariaDB instead of trying to set something else up. It works for me.

amelchio · March 20, 2020, 8:26pm

Where did this rumor come from? The recorder should purge after 30 minutes if it is due at startup.

The database is rarely the problem. Most of the delay with logbook and history is in HA code, either the Python core or the frontend. A faster CPU in those places helps a lot.

truglodite · March 20, 2020, 9:16pm

If it is a rumor, some documentation would clear it up… state that 30min after boot thing, and maybe add some logs to look for to verify it ran? Rumor or not, I use an automation to purge so I can control exactly when it happens… which is not while I’m at work and the process could delay alarm stuff.

VDRainer · March 20, 2020, 10:24pm

logger:
  default: warn
  logs:
    homeassistant.components.recorder: debug

squirtbrnr · March 21, 2020, 2:32am

Not a rumor. Doesn’t the next line say the latest the purge function will run is the purge interval set in the configuration? In my case it’s set to 1. I use docker and blow away the container to upgrade, wouldn’t this reset things such that HA doesn’t think it’s due yet? Therefore, it would run at the latest 24 hours after starting HA.

Documentation explaining the purge function, its timing, and other relevant info coming from those who understand it best would help clear up any confusion. Pointing to a single line of code is not documentation.

EDIT: Also, if it’s supposed to run at the earliest 30 minutes after start, then why does the size of my DB not change until after 24 hours? As a test I have restarted the container and my DB did not change size until 24 hours later. Also if I keep restarting HA before the 24 hours mark for uptime, my DB grows and looking inside the DB reveals entities that are older then the 28 day keep days I have set. As soon as HA reaches the 24 hour uptime mark, shortly thereafter my DB shrinks and those entries disappear.

DavidFW1960 · March 21, 2020, 3:24am

I agree. I have always read on the forums that if you set it to purge everyday that HA has to be up for a day before it will trigger. If the purge interval is 2 days then 2 days. This is why I have an automation that runs at midnight every day to run the purge.

EDIT: There’s actually all kinds of shit that breaks when HA restarts… delays in automations etc are zeroed out… if HA is aware and can continue on after a restart then not doing this is just lazy programming…

squirtbrnr · March 21, 2020, 4:39am

Not to mention in the case of the OP, they did not have purge interval as a Config option in their setup. The DB size did not go down until after 24 hours as indicated by their response in post #10.

amelchio · March 21, 2020, 9:59am

Sorry if I came across as snide, I was actually genuinely curious.

The code tries to get back into sync after a restart. So if it purged last night at 9PM and you restart today at 6PM, it should purge in about 3 hours’ time (with a lower limit of 30 minutes).

So I guess the fully correct way to state it is that it can take up to 24 hours. Now I see how that can lead to a rule of thumb that one must wait for 24 hours. Even if one does realize the bit about synchronization, there is no simple way to tell the time that it tries to synchronize to.

It can also be confusing that purging will not shrink the database but just stop it from growing because parts get marked for reuse.

Incidentally, I think this should be simplified to what several posters are already doing: remove the purge_interval setting and just purge every night at 4 AM if there has been no purge in the past 24 hours.

This would be easier to document and it would still allow users to move the purge time to an automation if 4 AM is for some reason inconvenient. However, purging should actually be quite fast these days and not cause an interruption like it did in older releases (years back).