Wyoming-xtts - XTTS v2 Text-to-Speech for Home Assistant

Hey there

I spent the last two days working on this project so I can use XTTS with Home Assistant voice.

I’m not sure if this belongs in this category or in another. Feel free to move it when I’m wrong here. :slight_smile:

I built this because I wanted a better voice output from my voice assistant. I wanted to use German, and I wasn’t very happy with the German Piper models. The existing XTTS options were either abandoned, unstable beta, required OpenAI API compatible bridges, didn’t support proper streaming, or didn’t work on my 1080.

This is just simple XTTS over the Wyoming protocol. Home Assistant should find it automatically through Zeroconf and you can directly use it. Drop your voice samples in a folder, point Home Assistant at it, done.

Features worth mentioning:

  • Wyoming compatible, should support everything that Home Assistant requests
    • This means you can configure the voice/language* directly in Home Assistant
  • Bidirectional streaming (text from LLM streams in, audio streams out simultaneously to reduce delay)
  • DeepSpeed for faster inference if you have the VRAM
  • Zeroconf so HA discovers it automatically
  • Works on older NVIDIA cards (Pascal/GTX 10xx and up)
  • XTTS does voice cloning out of the box which is quite cool

* I noticed that Home Assistant currently does not send the selected language to wyoming-tts servers, so … you can configure it, but it’s not being sent. I’m sure this is not expected behavior and will eventually be fixed. For now the language will be auto detected. (Configurable).

I focused on getting the core right. Configuration is just environment variables, and the defaults should work for most setups. If something doesn’t work, there’s less to debug.

Installation and use is documented in the README. I hope it’s easy to understand.

One note though:
GPU(Nvidia 10xx+) is pretty much a hard requirement; XTTS is quite slow on CPUs.
You also need +14GB of space for the Docker image and +3GB for the XTTS model. (sorry. :grin:)

Oh and it’s v0.1.0 so maybe expect something not working as it should. :sweat_smile:

Let me know what you think!

GitHub: GitHub - lmoe/wyoming-xtts: XTTS v2 Text-to-Speech for Home Assistant. Wyoming compatible.

1 Like

Awesome. Congrats!

1 Like

Didn’t work for my 5090 unfortunately.

Can you tell me what didn’t work exactly?

Based on your project, I installed it as native LXC on my Proxmox with an RTX 3050. The result is absolutely phenomenal. And it was detected immediately in Home Assistant. Thank you very much.

1 Like