xTTS v2 with Home Assistant

reptar · February 13, 2024, 9:49pm

I’m wondering if anyone has been able to leverage the xtts v2 model from coqui-tts… I was finally able to get tts-server to work with a fine tuned model + was able to fix some issues that popped up with the MaryTTS endpoint.

It seems like the better route for this would be to stream the chunked audio though (it takes a couple of seconds to gen. the full audio) and that is outside of my abilities.

Has anyone been able to run a fine tuned model with audio streaming?

python · February 15, 2024, 12:30am

I’ve just run it, it required minor changes to the server.py file. But it has API GET /api/tts?text=Blabla which I do not know how to connect to HA yet

reptar · February 15, 2024, 1:03am

You can use MaryTTS to connect it.

Are you using a fine tuned model?

python · February 15, 2024, 1:22am

Oh! thanks, you saved me time for research. Not tuned yet but it’s not different than tuned.

It seems I had to modify two files:

TTS/utils/synthesizer.py

In line 183 I brutally provided config file for xxts2:

        #self.tts_config = load_config(tts_config_path)
        self.tts_config = load_config("/home/tts/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/config.json")

TTS/server/server.py

In line 203 I brutally put the parameters I wanted:

        wavs = synthesizer.tts(text, speaker_name="Gitta Nikolina", language_name="en", style_wav=style_wav)

and here you just simply modify it to include your own improvements.

Voila!

reptar · February 15, 2024, 1:38am

Here’s some More info too

python · February 15, 2024, 1:49am

I must say that quality of that TTS is astonishing!

reptar · February 15, 2024, 2:10am

Agreed!
Let me know if you try a fine tuned model. For some reason on my side, the spoken words seem a little slow for some reason…

python · February 15, 2024, 2:14am

have you verified cuda is turned on for the server?

reptar · February 15, 2024, 2:17am

Yeah it’s running on the GPU. The actual inference is fine, just the pace in which words are spoken seems kind of slow.

The coqui tts docs mention something about a speed parameter but I have no idea how to use it

EDIT: it would be way better to do chunked audio streaming but I have no idea how to do that. I believe xtts-streaming-api allows for that, but I don’t think that is supported in HA

python · February 15, 2024, 2:25am

Warning: XTTS-streaming-server doesn't support concurrent streaming requests, it's a demo server, not meant for production.

So it’s a big risk for HA where parallel TTS can happen frequently. We need to wait until there is a better server.

python · February 15, 2024, 3:58am

but we have this!

reptar · February 15, 2024, 11:07am

Yup I saw that one too, it just does not appear to have a MaryTTS compatible endpoint so some integration would be required on the HA side

reptar · February 15, 2024, 2:43pm

ok I figured out inference speed and made changes to xtts.py

EDIT:
I was wrong, I’ve been having a really hard time trying to get this to pull “speed” from the model config.json… im not a programmer and am having difficulty setting a global variable in the TTS.config module.
For whatever reason, the value I’m trying to set which should be the path of config.json isnt accessible in xtts.py

reptar · February 16, 2024, 5:44pm

Finally got it figured out.
It required changed to the config module, server.py and xtts.py

but now at least it looks for the speed variable from the config.json of the currently loaded model.

EDIT:
modified links. I accidentally linked the original files and not the modified ones.

python · February 18, 2024, 2:42am

it works for me perfectly, i am still shocked how good this TTS is

reptar · February 18, 2024, 2:58am

Try running a fine tuned model!
I trained via Google colab and the result is great!

Pepe_Le_Pew · February 21, 2024, 8:27pm

I saw a youtube with the MaryTTS and it was amazing. It isn’t possible to run all of that on a Raspberry Pi 4 with Home Assistant, is it?

I started to research that day but saw on the coqui homepage that it is discontinued so I am surprised you guys are still running it.

reptar · February 21, 2024, 8:48pm

I’m running all of the coqui stuff (xtts v2 model) locally

reptar · February 21, 2024, 8:51pm

The main reason I’m using coqui tts-server is because it’s the only one I found that has a maryTTS compatible endpoint so it works with home assistant

Pepe_Le_Pew · February 23, 2024, 7:28pm

What hardware are you running on, is it too demanding for the raspberry pi with Hassio on it?