xTTS v2 with Home Assistant

I don’t think it would be able to generate the audio on a pi quickly…
I’m offloading all machine learning stuff off of my home assistant VM and to a ML server I put together with 2 3090 turbo GPUs

Hi , could you give a short guideline on how to run coqui xTTS on a local machine (for example linux with GPU\VRAM\torch and cuda) for work with HA? .
maybe as standalone service?

You need coqui tts-server installed on local network and use MaryTTS integration for HA.

Hello there! Sorry to bother you, but I was wondering: Could you explain your workflow again? I’m trying to do a very similar thing as to what you are doing. I am able to run tts-server with models that aren’t xtts-v2, but I really want to be able to run tts-server somehow designating what I want the “clone” wav to be so that I can then use the marytts api endpoint to provide tts to HA using… basically any voice for which I have a high quality sample. It looks like you got it working, but I just can’t seem to follow your work flow. It would be much appreciated if you could lay it out for me a little bit… Thank you in advance if this is possible!

How are you running tts-server? I’m running it like this:

tts-server --model_path /path/to/model.pth --config_path /path/to/config.json --use_cuda true --speakers_file_path /models/sj_v15/reference.wav

That being said the changes I made to server.py actually will look for a reference wav within the model folder and it will automatically use that.
The logs should look something like this when you use the MaryTTS API:

 > Model input: This is a test, what do the logs look like?
Using default style wav: /path/to/reference.wav
 > Text splitted to sentences.
['This is a test, what do the logs look like?']
/usr/local/lib/python3.10/dist-packages/torchaudio/functional/functional.py:1464: UserWarning: Applied workaround for CuDNN issue, install nvrtc.so (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:80.)
  resampled = torch.nn.functional.conv1d(waveform[:, None], kernel, stride=orig_freq)
 > Processing time: 6.761493921279907
 > Real-time factor: 1.9408839429835185
INFO:werkzeug:::ffff:192.168.5.10 - - [31/Mar/2024 13:08:01] "POST /process HTTP/1.1" 200 

You can see there that the default style wav is listed

Thank you for the help. I actually ended up pulling the code from early december, and making some edits I found on a different forum. I hard coded the reference wave instead of your way (though your way seems more elegant). At least for now, it is working though!

My hope is that the “alltalk-tts” project adds open-ai tts compatability soon. If so, I’ll probably switch to that. Thanks again!

Also, have you noticed that the speaker’s rate of speaking is very low? I don’t know if it somehow was related to the way I was running xtts-v2 model, but everything seemed slowed down. I ended up speeding it up manually before returning the wav and that seemed to fix it.

I did notice that and added some changes that allow you to pass the speed value in the config json for the model.

I’m afk right now but will share the edited fine when I’m back home!

OK so it looks like I made the necessary changes on xtts.py

If speed doesnt exist in the config json, then it defaults to the normal value of 1.0 Right now I’m using a value of 1.15 and it sounds great

Thank you again! I’m willing to bet your strategy is a lot better than mine lol. I had to essentially modify the final wav file before I sent it back from the server lol. Also, if you’re interested, check out this project… Totally amazing quality of voice.

No API (BRAND new project, and inference speed isn’t quite 1:1), but the accuracy is fantastic. I’m thinking about coding one myself, but I might wait a few weeks to see if the community comes up with ways to speed up the inference.

How does it compare to xttsv2?

I’d have to wait till there’s MaryTTS compatibility since my primary use case is home assistant

I checked out the demo, thats great! Hopefully some integration with HA is possible in the future.

I’ve been following this project for some time:

There is currently a beta with support for an OpenAI-compliant API. It supports xTTS, Piper, vits, parler, and also has RVC support to take generated text and further process it with an RVC model.

I’m trying to find a way to get the MaryTTS integration to talk to the API that’s exposed by AllTalk but I’m not having any luck. Anyone else have any luck with this, or is there a community project that exposes an AllTalk or OpenAI API-compliant TTS service?

1 Like

I’m going to use this for that purpose

Basically going to pair it with this as the maintainer plans on supporting fine tuned xtts v2 models

Oh, that’s awesome! I’m going to try pairing AllTalk with this OpenAI TTS component and see if it works.

Let me know how it goes, I’ve been wanting to switch away from tts-server!

With a few code modifications, I got it to work - add this repo to your HACS custom repositories list, and set it up as a new integration.

https://github.com/qJake/openai_tts

EDIT: See below for an update.

Instead of an OpenAPI key, it will ask you for your OpenAPI-compatible TTS endpoint, for example:

http://192.168.0.25:7851/v1/audio/speech

Enter the full URL as the configuration entry. If that endpoint is an OpenAPI TTS-compatible endpoint, it should work.

Note: I am having difficulties with the returned format (wav) working on ESP32-S3 devices like the BOX-3 - the audio is returned, but stutters / crashes the playback engine on the ESP32 device. So if you intend to use this with an ESP32 device, be warned that it may not work as of right now.

EDIT:

The maintainer of the main OpenAI TTS custom component has added support for custom OpenAI-compliant URLs in the main project. I’ve archived my repository, and recommend you use this one instead - it works with AllTalk v2:

1 Like

I think one more step and we’ll use streaming.
The OpenAI TTS component supports this. I keep looking to see which API server is running.

Can you help me understand how to set this up ?

My questions are:

  1. Where to start with TTS server?
  2. How to add it to integrations so that i could point my assist character to use it? (like i have with piper)

I ws looking at using Coqui.ai but cant find integration for it on home assistant.

Why do i need this instead of piper? Because piper doesnt have my language for TTS. Whisper has STT in my language but then i dont get responses in my language… Coqui.ai has a model in my language…

Thank you

I’ve stopped using the xtts server from the coqui.ai project… instead I use this as a server:

You’ll need to install this project through HACS:

Which will then let you configure the openedai as a TTS engine.