Thanks for the help man! I’m going to post my journey for those who are also SQL-limited as I am.
I use the MariaDB plugin for Home Assistant. Since I didn’t really know how to connect to my database, I wanted to find a nice GUI-capable tool. For this, I have used dbForge.
Stepwise:
- download and install dbForge.
- Go to the MariaDB config page, and in the Network section, enter your port number. In my case the Container column reads 3306/tcp and I have entered 3306 below Host.
- Open dbForge and use this tutorial to enter your database details
- Use type TCP/IP
- As Host enter your HA IP address and as port enter the port you used (3306 in my case)
- As User and Password, use the entries in your MariaDB plugin config under logins
- As Database, use the database name set in your MariaDB plugin config (default is homeassistant)
- Press Test Connection or Connect if you’re feeling lucky!
- Press ctrl+n to start a new SQL
- Enter the commands stated above by @heinrich
Some useful commands I have used:
To find out which entities use the most data (updated 2025-01-06 based on this post by @jeedewee )
SELECT
COUNT(*) AS cnt,
COUNT(*) * 100 / (SELECT COUNT(*) FROM states) AS cnt_pct,
states_meta.entity_id
FROM states
INNER JOIN states_meta ON states.metadata_id=states_meta.metadata_id
GROUP BY states_meta.entity_id
ORDER BY cnt DESC
To remove entities from the database directly using regular expressions:
-- first test if the regular expression works. I'm looking for e.g.: sensor.blitzwolf_NUC_energy_voltage
SELECT entity_id, COUNT(*) as count FROM states WHERE entity_id LIKE 'sensor.blitzwolf%status' GROUP BY entity_id ORDER BY count DESC LIMIT 10;
-- then remove the entities. This is final!
DELETE FROM states WHERE entity_id LIKE 'sensor.blitzwolf%energy\_voltage';
To find out how much data each table (I think it’s called a table) uses (credit goes to mbuscher)
SELECT
table_name AS `Table Name`,
table_rows AS `Row Count`,
ROUND(SUM(data_length)/(1024*1024*1024), 3) AS `Table Size [GB]`,
ROUND(SUM(index_length)/(1024*1024*1024), 3) AS `Index Size [GB]`,
ROUND(SUM(data_length+index_length)/(1024*1024*1024), 3) `Total Size [GB]`
FROM information_schema.TABLES
WHERE table_schema = 'homeassistant'
GROUP BY table_name
ORDER BY table_name
Another thing I found useful was to plot the first 1000 entities of the first query using Excel and then calculate the sum of all counts up until that entity. That way I found out I could reduce the size of my database by a factor of 10, simply by removing the first 100 entities from the database.
EDIT 2025-01-06:
The above query has not been working for a long time, and many have already found workarounds. However, if you end up googling this later, I want to refer to the correct queries, which can be found here: How to keep your recorder database size under control - #126 by jeedewee