Taming the Recorder

Thanks for the help man! I’m going to post my journey for those who are also SQL-limited as I am.

I use the MariaDB plugin for Home Assistant. Since I didn’t really know how to connect to my database, I wanted to find a nice GUI-capable tool. For this, I have used dbForge.

Stepwise:

  • download and install dbForge.
  • Go to the MariaDB config page, and in the Network section, enter your port number. In my case the Container column reads 3306/tcp and I have entered 3306 below Host.
  • Open dbForge and use this tutorial to enter your database details
    • Use type TCP/IP
    • As Host enter your HA IP address and as port enter the port you used (3306 in my case)
    • As User and Password, use the entries in your MariaDB plugin config under logins
    • As Database, use the database name set in your MariaDB plugin config (default is homeassistant)
    • Press Test Connection or Connect if you’re feeling lucky!
  • Press ctrl+n to start a new SQL
  • Enter the commands stated above by @heinrich

Some useful commands I have used:
To find out which entities use the most data (updated 2025-01-06 based on this post by @jeedewee )

SELECT
  COUNT(*) AS cnt,
  COUNT(*) * 100 / (SELECT COUNT(*) FROM states) AS cnt_pct,
  states_meta.entity_id
FROM states
INNER JOIN states_meta ON states.metadata_id=states_meta.metadata_id
GROUP BY states_meta.entity_id
ORDER BY cnt DESC

To remove entities from the database directly using regular expressions:

-- first test if the regular expression works. I'm looking for e.g.: sensor.blitzwolf_NUC_energy_voltage
SELECT entity_id, COUNT(*) as count FROM states WHERE entity_id LIKE 'sensor.blitzwolf%status' GROUP BY entity_id ORDER BY count DESC LIMIT 10;
-- then remove the entities. This is final!
DELETE FROM states WHERE entity_id LIKE 'sensor.blitzwolf%energy\_voltage';

To find out how much data each table (I think it’s called a table) uses (credit goes to mbuscher)

SELECT
    table_name AS `Table Name`,
	table_rows AS `Row Count`,
	ROUND(SUM(data_length)/(1024*1024*1024), 3) AS `Table Size [GB]`,
	ROUND(SUM(index_length)/(1024*1024*1024), 3) AS `Index Size [GB]`,
	ROUND(SUM(data_length+index_length)/(1024*1024*1024), 3) `Total Size [GB]`
FROM information_schema.TABLES
WHERE table_schema = 'homeassistant'
GROUP BY table_name
ORDER BY table_name 

Another thing I found useful was to plot the first 1000 entities of the first query using Excel and then calculate the sum of all counts up until that entity. That way I found out I could reduce the size of my database by a factor of 10, simply by removing the first 100 entities from the database.

EDIT 2025-01-06:
The above query has not been working for a long time, and many have already found workarounds. However, if you end up googling this later, I want to refer to the correct queries, which can be found here: How to keep your recorder database size under control - #126 by jeedewee

1 Like