Large homeassistant database files

Hey guys,

Keep running into the following problem, even from older to current versions of HomeAssistant.

My HomeAssistant database files keep getting very large >100mb. When this happens, I can’t restart home assistant from the UI. I run HomeAssistant in a docker, so must do docker restart home-assistant. Restarting becomes really really slow.

  • On the 16th of August, my home_assistant_v2.db file got to 244mb. I renamed it, and let a new DB get generated after a reboot
  • On the 29th of August, my db got to 122mb, Renamed the DB and rebooted.
  • It is now the 23rd of September, and my log file is at 234mb.

Here’s some things I’ve tried to do to keep this file size down

  1. The history component in my configuration has been disabled.
  2. I’ve set a maximum of 7 days recording for the log file, with the following config recorder: purge_days: 7
  3. I have the logbook enabled

So my questions are

  1. What filesize do people generally see for their `home_assistant_v2.log`` files?
  2. Is there a program (pref Windows based) I can use to load some of these home assistant databases, and see what data is taking up; the msot space?

Thanks in advanced!

1 Like

My files:
home-assistant.log = 128MB
home-assistant_v2.db = 540MB
Got that running sinc about April. So roughly 100MB per month for the database.

My home-assistant_v2.db was 7500 MB. I ended up bringing up MySQL and porting my data over there. Runs much better than sqlite…as it should.

As far as the app to use for browse the data, I use: http://sqlitebrowser.org/. You’ll want to have Home Assistant off while browsing the file.

1 Like

Thanks guys!

Okay so looks like my file size is normal. I’ve been tempted to port over to another DB solution anyway, so guess I should go ahead with that.

Thanks for the link. I;ll check that out!

Since moving over to MySQL, has your HomeAssistant been faster to reboot?

Its definitely faster rebooting and pulling up graphs.

Brilliant!

Thanks mate. Another thing to add to the list :slight_smile:

1 Like

my DB is now 10 days old and 500 kb.
the one before is 1000 kb and took a month.

but i get a little trouble looking at grafics and history if it is over 200.
that i let it get that big is because i dont look at them very much, but still want to collect that data.

through appdeamon i collect my data in another way and there is my log(from the sensors i want to see) only 2 kb in the last month.

i think it would be very wise if there comes a moment where we can decide which data we want to collect and which data not.

1 Like

@adamderuwe can you point to a tutorial or walk through on this? My db is getting huge and is really unwieldy when trying to look at history or bring up details of a device.

2 Likes

Please tell us how, i’m very interested in this!

If you don’t want to bring your historical data over, its very easy:

  • Bring up MySQL somehow (personally, I brought up a MySQL Docker). But you could just as easily do a sudo apt-get install mysql-server

  • Install dependencies: sudo apt-get install libmysqlclient-dev and pip3 install mysqlclient

  • In your configuration.yaml, add a section like is documented here

That’s at a pretty high level, but if you need more specific instructions, ask away.

6 Likes

It creates the table structure itself then? You don’t have to set the db structure up manually or via a script?

Thanks for the instructions, seems really easy!

However, i don’t understand why the home-assistant_v2.db got so big anyway. It was 323MB on my system, but i have set these parameters in my configuration.yaml

recorder:
  purge_days: 8

Is it ignoring this, or do i create that much events in 8 days to fill up 323MB ?

I see my db file to grow as well and I can see that the gui is quit slow when it is trying to read from sqllite.
I was thinking going tor influxdb or mysql. Anyone knows the advantages/disadvantages going that way?
Influxdb or mysql?

Yep, that is correct.

1 Like

I had the same issue. I was looking at the data in the sqlite database to figure out where all the space was going, then decided I didn’t care because I was going to MySQL anyway. Sorry I’m not more helpful.

MySQL vs Influxdb doesn’t really matter at this scale. We’re talking about small amounts of data here. Influxdb would probably be best for this sort of time series data if we were talking massive scale, but for this, I’d probably go with MySQL just because you will find more documentation, should you need it.

1 Like

My database grows over 2Mb / hour, so right now I’m at 27Mb since the database was created about 12 hours ago. The growth has to do with the way HA logs events in the database.

In my case, the reason is my current implementation of my lighting control. Our house has a lighting control panel, where each lighting circuit is controlled by a latching relay. I’m working on a gateway between the lighting control panel and HA. I can control the panel with HA using MQTT, but someone can also turn a light ON or OFF with a hard wired switch. In order for HA to reflect changes made by a hardwired switch, and to do so in a timely manner, I publish a MQTT status message every 10 seconds or so.

This leads to about 120 events being logged / minute, when I scale it for the whole house, that would grow to over 600 events / minute. This brings up the question.

Should HA log duplicate, intermediate events or only event changes? For my lighting system, should HA log every event from 8:30pm to 9:13am, or just the OFF event at 8:30pm and the ON event at 9:13am

8:30:pm   GELV/binaryOutput/308/presentValue      OFF
8:31:pm   GELV/binaryOutput/308/presentValue      OFF
. . .
9:10am   GELV/binaryOutput/308/presentValue       OFF
9:10am   GELV/binaryOutput/308/presentValue       ON

For now I’m going to modify my gateway program so it pushes out a MQTT message only if an change in state is made, but this might be something the developers want to look at changing.

1 Like

Thanks much; I think I’m going to give this a shot today now that it appears the segfault issue has been crushed. :joy:

I have a little experience with MySQL from my Joomla!/Web dev days so I think I’ll go that route.

I see a huge number of similar things with Zwave and template sensors on my setup; duplicate lines in almost sets of 6-8 in a row of the same state messages. And my db is at 640mb despite setting the purge to hoild only 14 days of data.

I just setup MySQL last night and it works perfectly. I changed the MySQL datadir variable so that the databases are stored on an external USB drive, since I am running on Raspberry Pi and don’t want to deal with extra wear and tear on the SD card.

It has been running now for the past 12+ hours without a hitch :smiley:

Remember, even though Home Assistant auto creates the tables, you still have to create the empty database on MySQL before starting Home Assistant.

1 Like

I will also have a look at MySQL. A query to purge data automatically is easily ran every month or so through a cronjob. Or isn’t that needed and does HASS respect the purge_days parameter when using MySQL?