MariaDB database doubled in size under 2024.4 (update: actually due to Z2M "last seen")

Since upgrading to 2024.4.x, my MariaDB database (via the official addon) has roughly doubled in size. I’m pretty sure that the only change is to upgrade HA core at the beginning of the month and an upgrade of the MariaDB addon a couple of days ago.

I don’t remember seeing anything in the 2024.4 release notes about the database or recorder. Did I miss something?

This is the sensor I’m using to monitor the DB size.

sql:
  - # MariaDB Database Sensor
    db_url: !secret mariadb_url
    name: MariaDB DB Size
    query: 'SELECT table_schema "homeassistant", Round(Sum(data_length + index_length) / POWER(1024,2), 1) "value" FROM information_schema.tables WHERE table_schema="homeassistant" GROUP BY table_schema;'
    column: "value"
    device_class: data_size
    unit_of_measurement: MiB
    state_class: measurement

No increase for me:

Essentially the same query:

SELECT table_schema "database", Round(Sum(data_length + index_length) / 1048576, 2) "value" FROM information_schema.tables WHERE table_schema="homeassistant" GROUP BY table_schema

Possible there are some sensors that are updating rapidly and filling database up.

Look at the states table and see what’s in there and see if there are a lot of updates for a specific meta data id. Alternatively if you’re good at sql, select the last month of data group it by meta data id and count the records

Thanks.

Without a previous baseline, it’s pinpoint the issue. The biggest table is the state_attributes (via phpmyadmin addon), but it’s impossible for me to know if anything has changed there.

Using the following query, doesn’t really highlight anything unexpected. But it would be helpful to link the size of the attributes to each row, which I don’t know how to do.

SELECT states.metadata_id, COUNT(*) as Count, states_meta.entity_id
FROM states
INNER JOIN states_meta ON states.metadata_id=states_meta.metadata_id
GROUP BY states.metadata_id
ORDER BY Count DESC;

I wonder if something has changed in an integration that now means that some of the attribute updates that were identical between state updates for multiple entities are now different for each, and so need to be stored separately.

Here’s one someone posted which has helped me. It doesn’t sort by size, but by number of records, but it does display the size:

SELECT
  COUNT(state_id) AS cnt,
  COUNT(state_id) * 100 / (
    SELECT
      COUNT(state_id)
    FROM
      states
  ) AS cnt_pct,
  SUM(
    LENGTH(state_attributes.shared_attrs)
  ) AS bytes,
  SUM(
    LENGTH(state_attributes.shared_attrs)
  ) * 100 / (
    SELECT
      SUM(
        LENGTH(state_attributes.shared_attrs)
      )
    FROM
      states
      JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id
  ) AS bytes_pct,
  states_meta.entity_id
FROM
  states
LEFT JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id
LEFT JOIN states_meta ON states.metadata_id = states_meta.metadata_id
GROUP BY
  states.metadata_id, states_meta.entity_id
ORDER BY
  cnt DESC;
1 Like

Actually I wonder whether it coincides with when I enabled “last seen” on zigbee2mqtt. I’ll investigate some more, which means that every message from every zigbee sensor now has a unique attributes JSON.

I’ve confirmed that this is due to enabling “last seen” in Zigbee2MQTT.