Backup size of Whisper AddOn / language model

I have recently noticed, that my backup size increased drastically, since I installed Whisper (and Piper) as an AddOn (HA-OS).

After investigating I found out, that the language model will be saved in the backup as well. In my case this increases the size from around 300MB up to way over 1GB.

Is there any setting or something else, where I could leave the language model out of the backup?

I don’t think, a language model must be included in the backup. As far as my thinking goes, I could always re-download it, if necessary. I mean, sending 700MB in a backup doesn’t make sense to me, if I can easily download the model, if needed. :slight_smile:

Anybody any ideas on that? I don’t want to disable the backup of the AddOn completely.

I have this issue as well. The download is only like 50 meg or so so how come the backups got so big?

In my case the size fits, I’m using (read: experimenting with) the medium-int8 model, and the size is around 800MB as advertised…

What I’m after is to leave the language model(s) entirely out of the backup.

Have you checked for your installation, that Whisper is really the culprit? If you haven’t, make a full backup with no password set (important!), download it and extract it with 7Zip or whatever zip program you have. There you’ll see what folder in the backup takes the most space.

But I’m out of ideas, how to leave out specific things in a HA backup…

I’m using the Samba Backup addon, and have noticed also that the backup size drastically increased after changing the Whisper model.

In Samba Backup, there is an option to exclude addons, but that creates a partial backup instead of a full backup.

Anyone know if there’s a difference between those two? Or is it clear as day, just Whisper excluded?

Edit:

A little late, but you know… :smiley:

The language models aren’t included anymore in the backup of Whisper. So the backup size should be back to normal. Everything that is saved now, is configuration and other things you need to restore. The language model will be downloaded upon restore. :slight_smile:

I read your message yesterday, enabled full backups (I used to just exclude whisper), and last night’s backup increased by 750MB, so something seems off.

  • Core 2024.4.3
  • Supervisor 2024.04.0
  • Operating System 12.2
  • Frontend 20240404.2

You’re right, there must be something off. I just checked my backups, and Whisper uses around 60MB with no language model.

This is from the Add-on documentation page:

Backups

Whisper model files can be quite large, so they are automatically excluded from backups. The models will be re-downloaded when the backup is restored.

Not my experience.

My system is currently:

HA O S in a proxmox partition on a x86-64 PC
Core               2024.5.3
Supervisor         2024.05.1
Operating System   12.3
Frontend           20240501.1

I’ll do another backup now … oops, Config > Storage says I only have 3.3GB available on my HA 32GB disk partition.

Curiously I did manage to create a Full Backup (System > Backups > Create backup > Full Backup) taking 3840.1MB. Download to my PC and open

Note the last entry … core_whisper.tar.gz 2.8GB out of 3.8GB file; and that that file contains the “medium.en” whisper model I am currently using, plus the files for previously used models.

This thread was started 11 months ago, and reported in this github issue.

As I said above, than something’s wrong on your side. :slight_smile:

Others don’t have that problem, and it is clearly stated in the docs, that the behaviour of the add-on changed some time ago and the models aren’t included anymore.

=> this means, there is something wrong on some installations and it needs to be taken care of via an issue on Github.
What I’m saying, it doesn’t look like a misconfiguration or something, that could be changed on user side. It looks more like it is some kind of combination or condition, that leads to this behaviour. That should be addressed on Github, so the developer can take a look.

@synesthesiam Do you have that on your list, or is this new to you? And if so, how should we proceed? Thanks!

Just to let people know, in another thread on the same day, Patrick commented on my previous post :

Actually, this is a nice example, how things can go sideways, because users aren’t good informed! That’s not your fault, how would one know that, but it’s exemplary for such issues. It seems to be open for nearly a year, and as you stated is not solved.

Both assumptions are wrong, unfortunately. The problem was solved months ago by excluding the language model from the backup. That would have been the time to close that issue.
What you’re having is something totally different: in your case the language model gets not excluded, but that is a bug. It has nothing to do with the beginning of that Github issue, as it is not related.

One was for the general function to exclude the models, whereas yours is an error with the newly added function.

So for now, the developer has neither any idea, that something’s wrong, nor that an implemented function doesn’t work as expected…

I can see, why people get confused here, but that is always a problem, if different knowledges come together. :slight_smile:


Curious that you are the originator of this forum thread last year.
You state that it was fixed then, and I have no reason to disbelieve you.
Unfortunately this forum thread was not marked as solved (which is not necessary, since the GitHub issue would have been). I unreservedly apologise for my mistaken impression that the issue of “models being included in backup” had not been resolved at the time.

This assertion seems to fly against the evidence. I accept that in your backup, Whisper may only use 60MB - but you didn’t mention which model you are using. Is it perhaps one of the ones excluded in last years fix ?

As you can see above, TurfFiber experienced the same issue this March, and on April 4th SaintTDI raised During a total backup, it saves Whisper's models files, raising the backup from 440MB to 1730MB · Issue #3545 · home-assistant/addons · GitHub , and I am the third user to add to that github issue.

Others do seem to have the same issue, and quoting the documentation proves only that (a) that is the intended operation, or (b) the documentation is wrong or out-of-date.


Patrick I am confused about what point you trying to make.
On one hand you are clearly saying it cannot be a bug; but is an isolated user problem - that myself and a few others have broken our systems, and it is up to us to fix whatever we have done wrong
… and then you acknowledge that there is no user configuration, for us to have done that or to fix it.

You state that a GitHub must issue be raised so the developer can take a look … as though the previously mentioned GitHub issue which was raised only last month does not do that.

Why is it not possible that the current models were excluded, but the newer larger models didn’t fit that exclusion criteria, or that something else has changed ?

I did not come here to start an argument, only to (a) confirm that another user is experiencing this issue, and (b) to provide additional information which may help debug.