We are looking to improve database performance. Since Home Assistant does not collect data, we do not know how large the average database size is. Please complete the poll and post the size of your database below.
For SQLite:
Using a shell:
# ls -lh /config/home-assistant_v2.db
-rwxr-xr-x 1 root root 244.7M Mar 25 16:55 /config/home-assistant_v2.db
For MySQL/MariaDB
Using phpMyAdmin: Click on the homeassistant database and look for the Sum of the size column
Using the CLI:
SELECT table_schema "DB Name",
ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) "DB Size in MB"
FROM information_schema.tables
GROUP BY table_schema;
Might want to add a separate poll regarding how many days you store. I store 30, so my database is huge. I would assume many people store less than 30 days.
ATM my DB is only 1.6 GB (with 10 days of data, for some tests). I started a new Maria DB about 2 weeks ago after some issues with the old. The old DB was +/- 20GB with 20 days of data.
MariaDB Hassio addon on the same machine (Celeron J4005 NUC) with 7 days recording for 500+ entities (recording is enabled for roughly a quarter of all HA entities) varies between 3.5 and 4 GB. Largest part of the entities for which recording was enabled is made of SNMP and MQTT sensors.
I keep 30 days of history and it is worth ~36GB of data in MariaDB on dedicated SSD volume of Synology NAS… setup not for everyone. Quite a lot of sensors excluded from recorder, because generating frequently changing data (every second).
If I can suggest something it would be ability to configure recorder history per entity… There are entities I’d like to keep history for extended period of time, but for some it is worthless, day or 3 would be more than sufficient… but it is only possible to exclude them entirely or keep history for as long as for others… So some sort of storage ‘tiers’ would be nice…
About 600mb for about 200 entities with 7day keep. I use influxdb for long term storage… about 1.5gb for 1 year so far (infinite keep) and that includes all entities (probably over 500). I think influxdb is better suited for large HA records. Honestly I only use the Mariadb for history charts in lovelace (1 to 3 day charts), and to handle reboots. So I could purge at 4 day and be fine, but 600mb isn’t much either way for an ssd.
Do beware that a lot of db size issues are caused by high reporting rates found in many power sensors (like those zwave outlets that also read current). I had to get rid of some of those I was using for that reason; stupid things would report like 10times per second even when not in use (I never succeeded using zwave commands to change that either). I replaced them with esphome devices, which are very easy to control as far as report frequency.
The database components need to be rewritten to store data as a relational database, instead of using the database as essentially a flat file storing JSON strings. That in itself will result in a 100x decrease of database size and about a 10,000x increase in performance.
When that is done, all issues with the database go away. No longer does a user need to worry about excluding certain entities, history purge, recorder settings, etc. Storing data as the native DB data types solves all these issues.
As an example, storing power monitoring data for a month (10 integer values, every 2 seconds) bloated my HA database by 4GB and severely reduced history performance. Storing those same values to a MySQL DB using native data types in the same period used about ~25MB.