Large homeassistant database files

:+1: Works great, thanks.

1 Like

Is there anyway to specify an alternate DB location in case the current path isn’t available?

Ideally, specify two db_url’s, one as primary and one as a backup.

Not sure - I think they would need to be on separate servers with a cluster address?

Hi all. I have been trying to setup and use MySQL for my database following the instructions within this thread.
When I try and install mysqlclient in the home assistant virtual environment

pip3 install --upgrade mysqlclient

I get the errors below. Something about Cython not being installed?

(homeassistant_venv) homeassistant@raspberrypi:/home/pi$ pip3 install --upgrade mysqlclient
Collecting mysqlclient
  Using cached mysqlclient-1.3.12.tar.gz
Building wheels for collected packages: mysqlclient
  Running setup.py bdist_wheel for mysqlclient ... error
  Complete output from command /srv/homeassistant/homeassistant_venv/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-6p6lm0hs/mysqlclient/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpylduaybxpip-wheel- --python-tag cp34:
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-armv7l-3.4
  copying _mysql_exceptions.py -> build/lib.linux-armv7l-3.4
  creating build/lib.linux-armv7l-3.4/MySQLdb
  copying MySQLdb/__init__.py -> build/lib.linux-armv7l-3.4/MySQLdb
  copying MySQLdb/compat.py -> build/lib.linux-armv7l-3.4/MySQLdb
  copying MySQLdb/connections.py -> build/lib.linux-armv7l-3.4/MySQLdb
  copying MySQLdb/converters.py -> build/lib.linux-armv7l-3.4/MySQLdb
  copying MySQLdb/cursors.py -> build/lib.linux-armv7l-3.4/MySQLdb
  copying MySQLdb/release.py -> build/lib.linux-armv7l-3.4/MySQLdb
  copying MySQLdb/times.py -> build/lib.linux-armv7l-3.4/MySQLdb
  creating build/lib.linux-armv7l-3.4/MySQLdb/constants
  copying MySQLdb/constants/__init__.py -> build/lib.linux-armv7l-3.4/MySQLdb/constants
  copying MySQLdb/constants/CLIENT.py -> build/lib.linux-armv7l-3.4/MySQLdb/constants
  copying MySQLdb/constants/CR.py -> build/lib.linux-armv7l-3.4/MySQLdb/constants
  copying MySQLdb/constants/ER.py -> build/lib.linux-armv7l-3.4/MySQLdb/constants
  copying MySQLdb/constants/FIELD_TYPE.py -> build/lib.linux-armv7l-3.4/MySQLdb/constants
  copying MySQLdb/constants/FLAG.py -> build/lib.linux-armv7l-3.4/MySQLdb/constants
  copying MySQLdb/constants/REFRESH.py -> build/lib.linux-armv7l-3.4/MySQLdb/constants
  running build_ext
  failed to import Cython: /srv/homeassistant/homeassistant_venv/lib/python3.4/site-packages/Cython/Compiler/Lexicon.cpython-34m.so: cannot open shared object file: Value too large for defined data type
  error: Cython does not appear to be installed
  
  ----------------------------------------
  Failed building wheel for mysqlclient
  Running setup.py clean for mysqlclient
Failed to build mysqlclient
Installing collected packages: mysqlclient
  Running setup.py install for mysqlclient ... error
    Complete output from command /srv/homeassistant/homeassistant_venv/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-6p6lm0hs/mysqlclient/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-ilaxc5vz-record/install-record.txt --single-version-externally-managed --compile --install-headers /srv/homeassistant/homeassistant_venv/include/site/python3.4/mysqlclient:
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-armv7l-3.4
    copying _mysql_exceptions.py -> build/lib.linux-armv7l-3.4
    creating build/lib.linux-armv7l-3.4/MySQLdb
    copying MySQLdb/__init__.py -> build/lib.linux-armv7l-3.4/MySQLdb
    copying MySQLdb/compat.py -> build/lib.linux-armv7l-3.4/MySQLdb
    copying MySQLdb/connections.py -> build/lib.linux-armv7l-3.4/MySQLdb
    copying MySQLdb/converters.py -> build/lib.linux-armv7l-3.4/MySQLdb
    copying MySQLdb/cursors.py -> build/lib.linux-armv7l-3.4/MySQLdb
    copying MySQLdb/release.py -> build/lib.linux-armv7l-3.4/MySQLdb
    copying MySQLdb/times.py -> build/lib.linux-armv7l-3.4/MySQLdb
    creating build/lib.linux-armv7l-3.4/MySQLdb/constants
    copying MySQLdb/constants/__init__.py -> build/lib.linux-armv7l-3.4/MySQLdb/constants
    copying MySQLdb/constants/CLIENT.py -> build/lib.linux-armv7l-3.4/MySQLdb/constants
    copying MySQLdb/constants/CR.py -> build/lib.linux-armv7l-3.4/MySQLdb/constants
    copying MySQLdb/constants/ER.py -> build/lib.linux-armv7l-3.4/MySQLdb/constants
    copying MySQLdb/constants/FIELD_TYPE.py -> build/lib.linux-armv7l-3.4/MySQLdb/constants
    copying MySQLdb/constants/FLAG.py -> build/lib.linux-armv7l-3.4/MySQLdb/constants
    copying MySQLdb/constants/REFRESH.py -> build/lib.linux-armv7l-3.4/MySQLdb/constants
    running build_ext
    failed to import Cython: /srv/homeassistant/homeassistant_venv/lib/python3.4/site-packages/Cython/Compiler/Lexicon.cpython-34m.so: cannot open shared object file: Value too large for defined data type
    error: Cython does not appear to be installed
    
    ----------------------------------------
Command "/srv/homeassistant/homeassistant_venv/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-6p6lm0hs/mysqlclient/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-ilaxc5vz-record/install-record.txt --single-version-externally-managed --compile --install-headers /srv/homeassistant/homeassistant_venv/include/site/python3.4/mysqlclient" failed with error code 1 in /tmp/pip-build-6p6lm0hs/mysqlclient/

Has anyone else experienced this or have any ideas how to fix it.

Thanks

Ive been trying to move to mysql for a week now and just keep getting stuck. Ive been following the instrucinos below that I got from this thread.

However when I run:

sudo pip3 install --upgrade mysql-connector

I get:

Cleaning up...
Command /usr/bin/python3 -c "import setuptools, tokenize;__file__='/tmp/pip-build-d3th41x9/mysql-connector/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-qrqghvp2-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip-build-d3th41x9/mysql-connector
Storing debug log for failure in /root/.pip/pip.log

Ive tried upgrading setupttols with no luck…any helo would be appreciated!

sudo apt-get update && sudo apt-get upgrade
sudo apt-get install mysql-server && sudo apt-get install mysql-client
sudo apt-get install libmysqlclient-dev
sudo apt-get install python-dev python3-dev
sudo pip3 install --upgrade mysql-connector
sudo pip3 install mysqlclient


mysql -uroot -p

CREATE DATABASE homeassistant;
CREATE USER 'hass'@'localhost' IDENTIFIED BY '********';
GRANT ALL PRIVILEGES ON homeassistant.* TO 'hass'@'localhost';
FLUSH PRIVILEGES;
exit;
Test if user works:

mysql -uhass homeassistant -p
exit;
Switch to homeassistant user:

sudo su -s /bin/bash homeassistant
source /srv/homeassistant/bin/activate
pip3 install --upgrade mysqlclient
exit

Although this doesn’t help the main point of this thread, I created a quick sensor to log the size of the mysql database so I can monitor it.

  1. Create a script (e.g. check_db_size.sh) with the following command in it.
mysql -N -B -p'<YourPassword>' -u <YourDBUser> -e 'SELECT Sum(Round(( ( data_length + index_length ) / 1024 / 1024 / 1024 ), 2)) AS "Size in GB" FROM information_schema.tables WHERE table_schema = "<YourDBName>"'
  1. Update <YourPassword>, <YourDBUser> & <YourDBName> in the above command.

  2. Make the script executable.

chmod +x check_db_size.sh
  1. Create a command_line sensor
  - platform: command_line
    command: ./home/homeassistant/.homeassistant/check_db_size.sh
    name: HA DB Size
    unit_of_measurement: "GB"
    scan_interval: 300
  1. Enjoy your handiwork.
    DB_Sensor
8 Likes

This is awesome! Thanks for sharing!!

1 Like

I do the command line sensor like what ntalekt posted. Unfortunately the way docker runs on my Synology NAS, I had to run the command as root on my NAS and send the output to a file in the shared config folder, then within HA read that file to grab the value.

As for the database size, mine grows about 20-30MB per day. currently sitting at about 550MB and just restarted HA to update (though it was from docker) and it booted right up. Never had a real issue with the database or restarting HA since switching to MySQL/MariaDB.

1 Like

That’s a good tip. I might do something similar! Thanks

No problem Phil. Thanks for your glorious blog posts, they’ve helped me a lot.

Thanks for sharing, @ntalekt ! I used your code, and it’s up and running. For what it’s worth, I had to do the following extra steps to get it working:

  • install mysql-client (as I’m not using a MySQL server on my Raspberry Pi 3) with the following command: sudo apt-get install mysql-client
  • add the options -h 192.168.xxx.x -P 3307 to the command as my database is located on my Synology NAS
2 Likes

Ha ha, great minds think alike! Just yesterday I made a database size sensor so I could monitor the growth of the file. I’m using Hassbian with the default SQLite database so my sensor is just a filesize sensor.

For those wanting to do that for the basic SQLite db file (or any other file for that matter), here is the setup:

  1. Add a command_line sensor to your config.
    This uses the stat command with -c %s format option to retrieve the total file size in bytes. I divide by 1024 twice to get from bytes to MB. You can tweak the template to your liking. I also bumped the scan_interval up from the default of 60s to 900s (15 minutes) as I’ve been trying to reduce how much data is going into my database.
  - platform: command_line
    name: 'filesize_db'
    command: 'stat -c %s "/home/homeassistant/.homeassistant/home-assistant_v2.db"'
    unit_of_measurement: 'MB'
    value_template: '{{ (value|int / 1024 / 1024)|round(1) }}'
    scan_interval: 900
  1. I like to add an entry to customize too for name and icon (remember that’s in customize under homeassistant)
- sensor.filesize_db:
  friendly_name: SQLite DB size
  icon: mdi:database

image

Hope that helps others using the default database setup.

5 Likes

Over the last week or so I’ve been looking into what is causing my database to grow so big. When I upgrade home assistant this last time it took almost 2 hours to start up because of some database operations is was doing on a 20+GB sqlite3 database. I know I don’t have 20GB of truly useful data in all that.

TL;DR Found the offenders and now only include entities that I want to see in either History or Logbook.

I’m sure someone will find these helpful so here are a couple queries that helped by find the culprits. Note, I’m using the default SQLite database on Hassbian. To speed things up and prevent corrupting my database I stopped home assistant, copied the homeassistant_v2.db over to my desktop, and began my investigation.

One of the first things I did was run sqlite3_analyzer and got a report on which tables were the biggest.

Events table

The events table accounts for 2/3 of the database size. Every sensor event the “event_data” field includes a full copy of the json for both the old and new states. This table doesn’t have an entity_id column so it makes it hard to track down which sensors to blame. The following query with a common table expression (CTE) can do that but beware if you have a multi-GB database it could take a really long time. The result of this query is each entity id, the total size of all event_data, the number of events for that entity_id, and the average size of event_data per event.

WITH step1 AS (
    SELECT *, INSTR(event_data, '"entity_id": "') AS i
    FROM events),
step2 AS (
    SELECT *, CASE WHEN i>0 THEN SUBSTR(event_data, i+14) ELSE '' END AS sub
    FROM step1),
step3 AS (
    SELECT *, SUBSTR(sub, 0, INSTR(sub, '"')) AS entity_id
    from step2)
SELECT entity_id, SUM(LENGTH(event_data)) size, COUNT(*) count, SUM(LENGTH(event_data))/COUNT(*) avg
FROM step3
GROUP BY entity_id
ORDER BY size DESC;

To investigate a single event by entity_id use this (make sure to keep that trailing double quote where it is):

SELECT * FROM events WHERE event_data LIKE '%sensor.home_power_main"%';

States table

The states table accounts for the other 1/3 of the database. Make essentially the same query there is a lot simpler and faster as it includes and is indexed on an entity_id column. Similarly the attributes is the big column here as it contains the json blob for the state.

SELECT entity_id, SUM(LENGTH(attributes)) size, COUNT(*) count, SUM(LENGTH(attributes))/COUNT(*) avg
FROM states
GROUP BY entity_id
ORDER BY size DESC;

To investigate a single entity_id in the states table use this:

SELECT * FROM states WHERE entity_id = 'group.thermostats';

Discoveries

Here are a few of my take aways after investigating and tweaking:

  • The bigger the database the slower the startup and shutdown
  • Each restart dumps a bunch of events into the database
  • Disabling history and logbook or excluding entities from either has no effect on the database. It will keep storing all the data even if you’re not looking at it.
  • Most of my database is json blobs
  • Most of the json blobs are duplicate data
  • z-wave devices store a ton of information! A single Aeotec Smart Home Energy meter has 25 different sensors (25!!!) I only need 3 of those for viewing useful data in history (zwave spec here)
  • The zwave option “Exclude this entity from Home Assistant” does not exclude it from recorder (not sure what it does…)
  • If you don’t care to see it in History or Logbook don’t allow the recorder to persist it

Recorder inclusion and exclusion

Since 95% of my database entries were storing stuff I didn’t care about I began with the approach that I should include only what I want to see in History or Logbook. Home Assistant provides a way to do this in the recorder component configuration. I should also point out that automations don’t need anything from recorder or the database. They will keep working just fine with it disabled.

UPDATE: The include/exclude code has been re-written as of 0.57. I’ve made some comments on how it works in this later post. Basically you don’t have to do the weird trick to include only a few of a domain while specifying other include:domains.

The new include/exclude logic is found in the entityfilter found here. Since it has been pulled out into a helper class it should make includes and excludes work the same in any configuration. The main improvement is that you can more easily do the selective include. To keep a trim I’d suggest taking the following approach:

  1. If you want all or most of a domain, put the domain in include:domains and the few things you don’t want in exclude:entities
  2. If you only want a few of a domain, put the few items in include:entities
  3. Everything else is excluded by default. This prevents new devices and components from filling your database with crap.

With that in mind here is an update to my config for home-assistant 0.57+:

recorder:
  purge_interval: 2
  purge_keep_days: 14
  include:
    domains:
      - automation
      - binary_sensor
      - climate
      - device_tracker
      - input_boolean
      - input_number
      - media_player
      - switch
    entities:
      - group.family
      - group.guests
      - sensor.centralite_3325s_0bbafb7a_1
      - sensor.centralite_3325s_056b240f_1
      - sensor.battery_phone
      - sensor.disk_use_home
      - sensor.energy_cost_yesterday
      - sensor.home_energy_usage
      - sensor.home_power_main
      - sensor.home_power_c1
      - sensor.home_power_c2
      - sensor.energy_usage_today
      - sensor.filesize_db
      - sensor.main_garage_door_status
      - sensor.pws_temp_f
      - sensor.ram_use
      - sensor.speedtest_download
      - sensor.speedtest_upload
      - sensor.speedtest_ping
  exclude:
    entities:
      - switch.vision_zl7432_inwall_switch_dual_relay_switch_2
      - switch.vision_zl7432_inwall_switch_dual_relay_switch_4
      - switch.vision_zl7432_inwall_switch_dual_relay_switch_5
      - switch.vision_zl7432_inwall_switch_dual_relay_switch_6
Details on my old hack...

I made several attempts to set the recorder includes and excludes working exactly they way I wanted and with as few devices needing to be specified as possible. It kept failing and then I read the source code and found out why. They way it works is not intuitive to me but I found a way to make it work. The false assumption I made is that you can use the include domains, include entities, and exclude entities. I found 3 scenarios for setting inclusions and exclusions, the 3rd being my case and a head scratcher:

  1. If you want to exclude everything from a domain, put it in exclude:domains
  2. If you want most of a domain but want to exclude a few things, put those few in exclude:entities
  3. If you only want a few of a domain, put the entity in include:entities and domain in exclude:domain and include:domain. If you don’t add it to the include:domains it will get excluded here. If you don’t add the domain to the exclude:domains then everything else from the domain will also get through. .

You’ll notice that there are domains (like sensor) doubled up in exclude and include because of that 3rd point, but it works perfectly.

recorder:
  purge_interval: 2
  purge_keep_days: 14
  exclude:
    domains:
      - camera
      - cover
      - group
      - sensor
      - sun
      - updater
      - zone
      - zwave
    entities:
      - switch.vision_zl7432_inwall_switch_dual_relay_switch_2
      - switch.vision_zl7432_inwall_switch_dual_relay_switch_4
      - switch.vision_zl7432_inwall_switch_dual_relay_switch_5
      - switch.vision_zl7432_inwall_switch_dual_relay_switch_6
  include:
    domains:
      - automation
      - binary_sensor
      - climate
      - device_tracker
      - group
      - input_boolean
      - input_number
      - media_player
      - sensor
      - switch
    entities:
      - group.family
      - group.guests
      - sensor.centralite_3325s_0bbafb7a_1
      - sensor.centralite_3325s_056b240f_1
      - sensor.battery_phone
      - sensor.disk_use_home
      - sensor.energy_cost_yesterday
      - sensor.home_energy_usage
      - sensor.home_power_main
      - sensor.home_power_c1
      - sensor.home_power_c2
      - sensor.energy_usage_today
      - sensor.filesize_db
      - sensor.main_garage_door_status
      - sensor.pws_temp_f
      - sensor.ram_use
      - sensor.speedtest_download
      - sensor.speedtest_upload
      - sensor.speedtest_ping

Sorry that was so long winded. I hope it helps people chase down their log hogs!

25 Likes

hi, i am trying to install mysql db on a fresh clean HASSBIAN image (2017-10-11-Hassbian), but i fail right at the first step: “sudo apt-get install libmysqlclient-dev” fails with an error about “Package ‘libmysqlclient-dev’ has no installation candidate”.
When I continue anyway, after installing the mysql-server I was not prompted to set a root password. but when i try to log in i am asked for a password and and leaving it blank fails with an authentication error, so i guess there is a mysql password set in the HASSBIAN image already. but i can’t find any info what it might be by default :frowning:
Help appreciated…

Is there any speed advantage to setting up mysql on the same Pi as HASS is running on? Or does mysql need to be on a different computer to speed up the historical data on HASS?

regardless of where MySQL is installed, you will see a vast improvement in speed when querying the history. I don’t think it matters where it is installed as the connection method would still be the same.

So it’s just that Mysql is much faster than the default SQlite?

They did some speed improvements to the recorder and database a few versions ago (0.51) but there is still a slight speed improvement using MySQL over SQLite.

Also it comes down to personal preference. I already had a MySQL (MariaDB actually) database setup for managing my media library through Kodi, so it was pretty easy to setup the database for HA. I also don’t like files growing wild in a folder like the SQLite database and the log file, so I take every effort to either relocate them and/or manage them better like configuring the recorder:, history:, logbook:, and logger: components.

1 Like

Ok, I set up MySQL on a separate debian server I had. I made a database called “hassdb” and a user “hass” and gave full control to that user.

When I point home assistant to it with this configuration.yaml:

recorder:
db_url: mysql://hass:[email protected]:3306/hassdb

I get errors that HASS can’t communicate with MySQL. I confirmed that the server is listening on 3306 with netstat -tln.

Any ideas?

Thanks!

Make sure the firewall is disabled: service firewalld stop or that you have allowed MySQL access through the firewall on port 3306.