Why the backup so big?

hello, someone know why my backup growing up everyday?
I buy google drive storage but its growing up so fast. its normal?

Assuming your backups are not password protected (don’t do this, it makes them very difficult to use) download one and open it with 7zip, or winrar and look inside to see what is the largest part.

Here’s one of mine, you can see the InfluxDB addon is responsible for the largest part of the backup. Understandable as I have 2 years worth of data in there:

Also if it is your homeassistant.tar.gz file that is the largest, open that up too, (it’s a compressed file within a compressed file) to see what inside has the largest size.

2 Likes

It is so big, because you probably never deleted the Database (Recorder - home-assistant_v2.db)

You don’t have to. It purges 10 day old data by default.

Yes, but even in 10 days it can grow huge. Every state change, every event, and all the energy data get stored in the database unless you manage it. You need to exclude every entity you don’t need to save, and consider reducing the purge days from 10 to maybe 4 or 5.

It’s very likely the database which is bloating your backups.

And remember the backups are always created locally, even if you use an add-on to copy them elsewhere once they’re created. So if you’re using an SD card, you’re shortening its life by writing those huge backups all the time.

It is a likely culprit, but lets wait and see what opening the backup reveals.

ok, this is the zip file, Its look like the frigate and double take and deepstack make this problem…
Can I do something about it?

You can do a number of things.

Stop retaining so much data in the addons.

Or if that is not possible, exclude those addons from your backups (partial backup).

I do a partial backup every night using SAMBA backup that copies my backup to my NAS. I then do a full backup once a week. If the worst happens I lose a weeks data from influxDB, but no Home Assistant configuration information.

You should be able to do the same thing with the automated Google Drive backup.

1 Like

Thank you
I will do partial backup…
thanks!

Still…
My SSD is 58GB occupied.
My full backup, after unpacking all zipped folders have… 32GB

What eats the difference?
Does anybody try to check how these numbers look in your case?

1 Like

Tom. I’m struggling now. My backups are growing by about 0.5Gb per week and stand at 17GB now!
Looking at the Backup it seems most in InfluxDb. I’m guessing this is Grafana. How can I reduce the data storage there? Can’t find any options for purging.
Also, in the config folder I have a home-assistant.log.1 file of 3Gb and a home-assistant.log file of, currently, 450Mb but it’s growing at about 10kb/sec. Not sure if that forms part of the back up but such growth doesn’t seem right.
Grateful for any help you can give.

Wow. You need to look in your log and see why it is so big.

Settings → System → Logs (show full log).

The Influx DB retention settings are on the Admin page

I save two years of data and my database is only 4.5GB. This is because I use includes in the InfluxDB config to only save the sensors I want to keep data for. See: https://www.home-assistant.io/integrations/influxdb/#examples

Thanks.
I’ve changed the InFluxDB retention to 800d; it was ∞. I’ve been running HA for 4 years now so that should halve the database (I guess not halve, as I’ve been adding more entities as the system has grown). My next backup is scheduled for tonight so will see what happens. I’ll look deeper inter into Include/Exclude thing now.

Re the log, the culprit seems to be millions of an entry I have no idea about. Here’s a small block…

2024-02-25 11:46:59.439 WARNING (MainThread) [asyncio] SSL connection is closed
2024-02-25 11:46:59.441 WARNING (MainThread) [asyncio] SSL connection is closed
2024-02-25 11:46:59.442 WARNING (MainThread) [asyncio] SSL connection is closed
2024-02-25 11:46:59.443 WARNING (MainThread) [asyncio] SSL connection is closed
2024-02-25 11:46:59.445 WARNING (MainThread) [asyncio] SSL connection is closed
2024-02-25 11:46:59.446 WARNING (MainThread) [asyncio] SSL connection is closed
2024-02-25 11:46:59.448 WARNING (MainThread) [asyncio] SSL connection is closed
2024-02-25 11:46:59.450 WARNING (MainThread) [asyncio] SSL connection is closed
2024-02-25 11:46:59.451 WARNING (MainThread) [asyncio] SSL connection is closed
2024-02-25 11:46:59.452 WARNING (MainThread) [asyncio] SSL connection is closed
2024-02-25 11:46:59.454 WARNING (MainThread) [asyncio] SSL connection is closed
2024-02-25 11:46:59.457 WARNING (MainThread) [asyncio] SSL connection is closed
2024-02-25 11:46:59.458 WARNING (MainThread) [asyncio] SSL connection is closed
2024-02-25 11:46:59.459 WARNING (MainThread) [asyncio] SSL connection is closed
2024-02-25 11:46:59.461 WARNING (MainThread) [asyncio] SSL connection is closed

Something to do with runner.py I think.

Logger: asyncio
Source: runner.py:188
First occurred: 11:46:51 (15428 occurrences)
Last logged: 11:47:29
SSL connection is closed

Any idea what I need to do?

None at all sorry. You could try upping the logging level to debug to see if that reveals anything further.

configuration.yaml:

logger:
  default: debug

Restart required to change it.

Don’t forget to put it back to warning (or info).

Changed it to ‘debug’ (and back) but after the reboot the SSL connection is closed thing thing did not reappear. It seems the reboot fixed it. Still OK a day later.

Also, following setting Retention to 800d, I checked with Grafana and sure enough I can’t now see anything now older than that, but the Backup that ran last night was still 17Gb. Is there a thing where the database file size doesn’t shrink unless I do a purge or something?

Yes you have to do a purge and select the repack option. I think this happens every Sunday morning automatically.

However this has nothing to do with influxdb or grafana. Only the HA database.

I’ve looked into the include/exclude thing, but with thousands of entities, I don’t know where to start.
Similar problem with all the entities exposed to Alexa and Google!
If I was starting from scratch, I would exclude everything and only let selected stuff through as things get added.

You start at the beginning :slight_smile:

It is a pain to do and a slightly less pain to maintain but if you are methodical you’ll get there eventually.

Some very good information here: (Note that the database changes from time to time, newer posts have updated the SQL to work with the current schema.)

Exactly! Personally I think whenever a new entity is created, the dialog should have a check box to include or exclude it from Recorder. The default should be exclude. This is already done for selecting a new device’s “area,” which to me is far less important.

At a minimum, this should be emphasized in all the “getting started” documentation. I have seen it popping up in some places, so I think the message is starting to get through.

1 Like