Large homeassistant database files

Raspberry Pi 3 Model B, hassbian from “HASSbian 1.1 - The Toy-box”
As described in https://home-assistant.io/getting-started/hassbian-installation/

Home Assistant 0.38.3; Path to configuration.yaml: /home/homeassistant/.homeassistant

These are the steps I used to get Mysql up and integrated into homeassistant. Collected together here for easy reference.

mysql-connector is from Oracle, the maintainer of Mysql; but the recorder component depends on mysqlclient. You might want mysql-connector for your Mysql python3 projects.

#####as user pi:

sudo apt-get install libmysqlclient-dev
sudo apt-get install python3-dev
sudo apt-get install mysql-client  # Needed to access Mysql from command line.

# sudo pip3 install --upgrade mysql-connector
sudo pip3 install  mysqlclient

Environment variables are used for database parameters so this script can be publicly available. Either define the variables appropriately, or substitute text.

I decided to name my Mysql schema “homeassistant” use whatever suits you, but be consistent.

Create Mysql database on host machine; create empty schema “homeassistant.” Create your DBUser in Mysql, giving it all privileges for the “homeassistant” schema. I used MySQL Workbench running on a Mac to do these tasks.


has a good description of how to do this.

Create schema in mysql database: (The schema itself already exists since you created it in order to give your DBUser privileges on it. The CREATE SCHEMA below is just for insurance. You could run this script as some other mysql root user and give DBUser privileges on the schema after this is run.)

#!/bin/bash
mysql -u${DBUser} -p${DBPassword} -h${DBHost} homeassistant <<EOF
-- MySQL Workbench Synchronization
-- Generated: 2017-02-21 15:57
-- Model: homeassistant
-- Version: 1.0
-- Project: homeassistant
-- Author: Thomas Demay

/*
homeassistant recorder service creates tables with time objects of data type "DATETIME."  Sqlite DATETIME type apparently includes micro sec.  The corresponding mysql data type is DATETIME(6).  For this reason, the tables are created before starting homeassistant with a mysql recorder.

Mysql TIMESTAMP(6) data type stores the time in the database as as UTC; converting on input and output to local time.  Homeassistant stores all times in UTC; so mysql TIMESTAMP(6) gets very confused.  Mysql DATETIME(6) does not do UTC conversions; so is what homeassistant expects.
*/

SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0;
SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0;
SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='TRADITIONAL,ALLOW_INVALID_DATES';

CREATE SCHEMA IF NOT EXISTS `homeassistant` DEFAULT CHARACTER SET utf8 ;

CREATE TABLE IF NOT EXISTS `homeassistant`.`recorder_runs` (
  `run_id` INT(11) NOT NULL AUTO_INCREMENT ,
  `start` DATETIME(6) NULL DEFAULT NULL,
  `end` DATETIME(6) NULL DEFAULT NULL,
  `closed_incorrect` TINYINT(4) NULL DEFAULT NULL,
  `created` DATETIME(6) NULL DEFAULT NULL,
  PRIMARY KEY (`run_id`))
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8;

CREATE TABLE IF NOT EXISTS `homeassistant`.`events` (
  `event_id` INT(11) NOT NULL AUTO_INCREMENT ,
  `event_type` VARCHAR(32) NULL DEFAULT NULL,
  `event_data` TEXT NULL DEFAULT NULL,
  `origin` VARCHAR(32) NULL DEFAULT NULL,
  `time_fired` DATETIME(6) NULL DEFAULT NULL,
  `created` DATETIME(6) NULL DEFAULT NULL,
  INDEX `ix_events_time_fired` (`time_fired` ASC),
  INDEX `ix_events_event_type` (`event_type` ASC),
  PRIMARY KEY (`event_id`))
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8;

CREATE TABLE IF NOT EXISTS `homeassistant`.`states` (
  `state_id` INT(11) NOT NULL AUTO_INCREMENT ,
  `domain` VARCHAR(64) NULL DEFAULT NULL,
  `entity_id` VARCHAR(255) NULL DEFAULT NULL,
  `state` VARCHAR(255) NULL DEFAULT NULL,
  `attributes` TEXT NULL DEFAULT NULL,
  `event_id` INT(11) NULL DEFAULT NULL,
  `last_changed` DATETIME(6) NULL DEFAULT NULL,
  `last_updated` DATETIME(6) NULL DEFAULT NULL,
  `created` DATETIME(6) NULL DEFAULT NULL,
  PRIMARY KEY (`state_id`),
  INDEX `event_id_idx` (`event_id` ASC),
  INDEX `states__state_changes` (`last_changed` ASC, `last_updated` ASC, `entity_id` ASC),
  INDEX `states__significant_changes` (`domain` ASC, `last_updated` ASC, `entity_id` ASC),
  CONSTRAINT `event_id`
    FOREIGN KEY (`event_id`)
    REFERENCES `homeassistant`.`events` (`event_id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8;

CREATE TABLE IF NOT EXISTS `homeassistant`.`schema_changes` (
  `change_id` INT(11) NOT NULL AUTO_INCREMENT ,
  `schema_version` INT(11) NULL DEFAULT NULL,
  `changed` DATETIME(6) NULL DEFAULT NULL,
  PRIMARY KEY (`change_id`))
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8;


SET SQL_MODE=@OLD_SQL_MODE;
SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS;
SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS;
EOF

#####Switch to homeassistant virtual environment:

(I created .profile and .bashrc files in /home/homeassistant to setup the environment as desired; the last step is to activate the virtual environment. In this case, the first two lines of the code below become:
“sudo su -l homeassistant”
If somebody knows that this is a bad idea, please let me know.)

#!/bin/bash
sudo su -s /bin/bash homeassistant
source /srv/homeassistant/bin/activate
pip3 install --upgrade mysqlclient

edit configuration.yaml:
(Don’t forget the ?charset=utf8 specification on the db_url.)

recorder:
  db_url: !secret recorder_db_url # recorder_db_url: mysql://[YOUR MYSQL USERNAME]:[YOUR MYSQL PASSWORD]@[YOUR MYSQL HOST]/homeassistant?charset=utf8
  exclude:
    domains:
      - automation
      - weblink
      - updater
    entities:
      - sun.sun   # Don't record sun data
      - sensor.last_boot
      - sensor.date

#####Switch back to user pi;
stop homeassistant;
[optional] copy records from sqlite database to mysql database;
start homeassistant.

#!/bin/bash
exit    # exit homeassistant user

echo "Begin    stopping homeassistant @ $(date)"; sudo systemctl stop [email protected]; echo "Finished stopping homeassistant @ $(date)"

#  Copying current Sqlite database to Mysql database is optional.  If copying
#  first delete any table entries that you created testing the Mysql database.
mysql -u${DBUser} -p${DBPassword} -h${DBHost} -e "delete from states; delete from schema_changes; delete from recorder_runs; delete from events;" homeassistant

#  Extract data from sqlite database; massage to be mysql compatible, then
#  feed into Mysql database.  Do tables in this order so that events records
#  exist before inserting states records that reference events.
sqlite3 /home/homeassistant/.homeassistant/home-assistant_v2.db <<EOF \
| sed 's/PRAGMA.*/SET FOREIGN_KEY_CHECKS=0; SET time_zone="+00:00";/g' \
| sed 's/TRANSACTION//' \
| sed 's/CREATE TABLE/CREATE TABLE IF NOT EXISTS/g' \
| sed 's/"\(\<.*\>\)" /`\1` /g' \
| sed 's/INSERT INTO/INSERT IGNORE INTO/g' \
| sed 's/CREATE INDEX.*//g' \
| tee schema_changes.sql \
| mysql -u${DBUser} -p${DBPassword} -h${DBHost} homeassistant &
.dump schema_changes
.exit
EOF
wait    # copy operations run in background then wait; so order of operations is preserved.
sqlite3 /home/homeassistant/.homeassistant/home-assistant_v2.db <<EOF \
| sed 's/PRAGMA.*/SET FOREIGN_KEY_CHECKS=0; SET time_zone="+00:00";/g' \
| sed 's/TRANSACTION//' \
| sed 's/CREATE TABLE/CREATE TABLE IF NOT EXISTS/g' \
| sed 's/"\(\<.*\>\)" /`\1` /g' \
| sed 's/INSERT INTO/INSERT IGNORE INTO/g' \
| sed 's/CREATE INDEX.*//g' \
| tee recorder_runs.sql \
| mysql -u${DBUser} -p${DBPassword} -h${DBHost} homeassistant &
.dump recorder_runs
.exit
EOF
wait
sqlite3 /home/homeassistant/.homeassistant/home-assistant_v2.db <<EOF \
| sed 's/PRAGMA.*/SET FOREIGN_KEY_CHECKS=0; SET time_zone="+00:00";/g' \
| sed 's/TRANSACTION//' \
| sed 's/CREATE TABLE/CREATE TABLE IF NOT EXISTS/g' \
| sed 's/"\(\<.*\>\)" /`\1` /g' \
| sed 's/INSERT INTO/INSERT IGNORE INTO/g' \
| sed 's/CREATE INDEX.*//g' \
| tee events.sql \
| mysql -u${DBUser} -p${DBPassword} -h${DBHost} homeassistant &
.dump events
.exit
EOF
wait
sqlite3 /home/homeassistant/.homeassistant/home-assistant_v2.db <<EOF \
| sed 's/PRAGMA.*/SET FOREIGN_KEY_CHECKS=0; SET time_zone="+00:00";/g' \
| sed 's/TRANSACTION//' \
| sed 's/CREATE TABLE/CREATE TABLE IF NOT EXISTS/g' \
| sed 's/"\(\<.*\>\)" /`\1` /g' \
| sed 's/INSERT INTO/INSERT IGNORE INTO/g' \
| sed 's/CREATE INDEX.*//g' \
| tee states.sql \
| mysql -u${DBUser} -p${DBPassword} -h${DBHost} homeassistant &
.dump states
.exit
EOF
wait

echo "Begin    starting homeassistant @ $(date)"; sudo systemctl start [email protected]; echo "Finished starting homeassistant @ $(date)"
5 Likes

You might look at the instructions for using PostgreSQL, it is much simpler and also resolves the performance issues.

These parts really really helped me connect to my MariaDB on my Synology NAS.
I struggled with the installation of the different dependencies and where to install them.

So, BIG thanks!:slight_smile:

2 Likes

and a big thanks from me too to both @MojaveTom and @ehsahog. I moved the HA db on my PI3 to my Synology NAS and its just so much quicker now.

Great post! I just moved my DB also and it’s WAY faster and snappier! Thanks!

My db grew to nearly 1.7GB since installing Home Assistant just a few weeks ago. My graphs started getting incredibly slow, more than 60 seconds to load.

I’ve followed these instructions from @MojaveTom more or less. Had to make some small changes to fit my install, and also I was not interested in migrating my data so I could skip that part. Here is how I did it:

sudo apt-get update && sudo apt-get upgrade
sudo apt-get install mysql-server && sudo apt-get install mysql-client
sudo apt-get install libmysqlclient-dev
sudo apt-get install python-dev python3-dev
sudo pip3 install --upgrade mysql-connector
sudo pip3 install mysqlclient

it asked me to create the root password at this point

mysql -uroot -p

CREATE DATABASE homeassistant;
CREATE USER 'hass'@'localhost' IDENTIFIED BY '********';
GRANT ALL PRIVILEGES ON homeassistant.* TO 'hass'@'localhost';
FLUSH PRIVILEGES;
exit;

Test if user works:

mysql -uhass homeassistant -p
exit;

Switch to homeassistant user:

sudo su -s /bin/bash homeassistant
source /srv/homeassistant/bin/activate
pip3 install --upgrade mysqlclient
exit

Add to configuration.yaml

recorder:
  db_url: mysql://hass:********@localhost/homeassistant?charset=utf8

Restart home assistant

sudo systemctl stop [email protected]
sudo systemctl start [email protected]

worked without a hitch.

Now to figure out why it was so big in the first place and trying to avoid it.

I have mostly zwave devices (7 off-the-shelf, and 2 built with a z-uno), and a philips hue bridge (1st gen) running 3 lights. It doesn’t make sense to me that these few devices should be generating so much data.

I used the advice from @bbrendon

cd /home/homeassistant/.homeassistant
sqlite3 -line home-assistant_v2.db 'select * from states'|grep entity_id|awk '{print $3}'|sort |uniq -c |sort -n|tail -20 

turns out my aeotec multisensor has more than 100’000 rows for each of 8 entities.

 113195 sensor.multisensor_sourcenodeid_4_2
 113223 sensor.multisensor_alarm_level_4_1
 113225 sensor.multisensor_relative_humidity_4_5
 113225 sensor.multisensor_temperature_4_1
 113226 sensor.multisensor_burglar_4_10
 113226 sensor.multisensor_luminance_4_3
 113227 sensor.multisensor_ultraviolet_4_27
 113230 sensor.multisensor_alarm_type_4_0

This sort of makes sense because I use it as an environmental monitor and have it report data as often as it allows. I’ve just timed how often it updates and it seems to be between 6 and 7 seconds, so it definitely creates more than 10’000 data points in a day.

so I updated my config to exclude the ones I don’t need:

recorder:
  db_url: mysql://hass:********@localhost/homeassistant?charset=utf8
  exclude:
    domains:
      - automation
    entities:
      - sun.sun
      - sensor.multisensor_sourcenodeid_4_2
      - sensor.multisensor_alarm_level_4_1
      - sensor.multisensor_burglar_4_10
      - sensor.multisensor_alarm_type_4_0
      - sensor.airquality_general_24_2
      - sensor.time_utc
      - sensor.time__date
      - sensor.time
      - sensor.date__time
      - sensor.internet_time

The next step is to figure out how to offload data to a cloud db before I purge it on the Pi. Any advice for doing that? Am I best off having Home Assistant write directly into a DB I have in the cloud? Or would I be better of having a script take old data from the local db and write it into a cloud db? Or can I do a backup at the same time as my purge interval and then just splice the backup into an archive db? Other options?

I need the data in a usable format in the cloud because I later want to build an AI integration to detect deeper patterns.

7 Likes

Why would you want to keep the data? I know; I am also a bad example as I never purge my database, but am starting to ask myself the same question. For historical data I now rely on influxdb and Grafana (still a newbie around that topic and really like phpmyadmin which is not available for influx). Influx and Grafana never let me down and are very fast, even while running on the same rpi3 as MySQL/haas…

that is interesting… in actually using a pi2
but soon I’ll get a pi3 and give a try but as said before my database is running an a synology NAS and there is a huge difference once running the history panel !

I successfully completed this today using @Josh_Levent’s instructs above. I did have to make one small adjustment. I have a pre-Dec2016 All-in-One installation so the syntax for switching to the correct user/venv was slightly different. Dropping this here for anyone in that boat.

Where Josh says “Switch to homeassistant user:”

sudo su -s /bin/bash hass
source /srv/hass/hass_venv/bin/activate
pip3 install --upgrade mysqlclient
exit

1 Like

Thanks for the comment about influxdb and grafana, I’ve managed to get that working on my RPi3 in the meantime as well. Now I can set my purge time real low, and keep the data in influxdb :slight_smile:

Still want to figure out how to get it from there into long-term cloud storage at some point. There is still some junk data in the influxdb, but certainly a lot less than the recorder is putting in the main database.

If anyone is interested about the influxdb/grafana install, happy to write up my walk-through on that here as well.

2 Likes

Hi Josh,
What is that you mean by idnetified by *******
And i would guess the localhost is the IP 127.0.0.1 ?

I am interested in a walk-through :slight_smile:

This line is where you create the user, which consists of three components, the username (“hass”), the password ("********") and the hostname (“localhost”).

********: make up your own password and then put in the config file.

localhost: default hostname, you can use any hostname or ip address that identifies the machine, e.g. “127.0.0.1”. Just make sure that the hostname specified in this line, is the same as the one you put in the config:

  db_url: mysql://hass:********@localhost/homeassistant?charset=utf8

The last item on this config line is the database name, which needs to match as well. In my case: homeassistant

thanks for these instructions - worked perfectly!

Thanks Josh,
Got everything working great. Now where does the data base file get saved to now?

I would like to install MySql but I wanted to install that in a Docker container. I tried to install Docker but at the end of the installation, it failed. Do you have any expertise in that?

should be /var/lib/mysql/yourdatabasename

I’ve been running mysql database on the same RPi3 as HA, it has started to run so slowly, so I checked the DB size and it was 1.5GB when using 30 day retention… So I blitzed it and now use 7 days and might even move the DB onto my QNAP NAS

Just wondering where is the database file saved now? Also, what is the name of the file? I looked at /var/lib/mysql/homeassistant and there is only one file there db.opt, which doesn’t look like the right file.

Finally, my home-assistant_v2.db file is still present and growing, even after I deleted it and restarted HA.

Sounds like you didn’t configure the recorder in Home Assistant correctly. When it’s configured correctly, the home-assistant_v2.db will not grow or be recreated.