OmniVoice local tts model with voice cloning

OmniVoice is a state-of-the-art massively multilingual zero-shot text-to-speech (TTS) model supporting over 600 languages. Built on a novel diffusion language model-style architecture, it generates high-quality speech with superior inference speed, supporting voice cloning and voice design.

Key Features

  • 600+ Languages Supported: The broadest language coverage among zero-shot TTS models (full list).
  • Voice Cloning: State-of-the-art voice cloning quality.
  • Voice Design: Control voices via assigned speaker attributes (gender, age, pitch, dialect/accent, whisper, etc.).
  • Fine-grained Control: Non-verbal symbols (e.g., [laughter]) and pronunciation correction via pinyin or phonemes.
  • Fast Inference: RTF as low as 0.025 (40x faster than real-time).
  • Diffusion Language Model-style Architecture: A clean, streamlined, and scalable design that delivers both quality and speed.

You can try it here OmniVoice - a Hugging Face Space by k2-fsa
See it in action here https://youtu.be/LZEZ4nmuahc?si=bQDStvbQZhiUa2M7

Best part is that someone done this GitHub - mitrokun/wyoming_omnivoice: tts for home assistant [OmniVoice] · GitHub

And now my homeassistant tts speaks with voice I’ve cloned. So cool.
Please don’t hurt the messenger (sparta style :grinning:) just wanted to share news. And maybe someone will make cool integration with it.

1 Like