OmniVoice is a state-of-the-art massively multilingual zero-shot text-to-speech (TTS) model supporting over 600 languages. Built on a novel diffusion language model-style architecture, it generates high-quality speech with superior inference speed, supporting voice cloning and voice design.
Key Features
- 600+ Languages Supported: The broadest language coverage among zero-shot TTS models (full list).
- Voice Cloning: State-of-the-art voice cloning quality.
- Voice Design: Control voices via assigned speaker attributes (gender, age, pitch, dialect/accent, whisper, etc.).
- Fine-grained Control: Non-verbal symbols (e.g.,
[laughter]) and pronunciation correction via pinyin or phonemes. - Fast Inference: RTF as low as 0.025 (40x faster than real-time).
- Diffusion Language Model-style Architecture: A clean, streamlined, and scalable design that delivers both quality and speed.
You can try it here OmniVoice - a Hugging Face Space by k2-fsa
See it in action here https://youtu.be/LZEZ4nmuahc?si=bQDStvbQZhiUa2M7
Best part is that someone done this GitHub - mitrokun/wyoming_omnivoice: tts for home assistant [OmniVoice] · GitHub
And now my homeassistant tts speaks with voice I’ve cloned. So cool.
Please don’t hurt the messenger (sparta style
) just wanted to share news. And maybe someone will make cool integration with it.