I spent a long time looking for a local TTS besides Piper, as I also need an external service/API compatible with OpenAI for other tasks involving neural networks. Most local TTSs only work in English and are incredibly slow. For example, yesterday I installed TTS Orpheus 3B with VLLM, and it required 20GB of VGPU, while the previously installed XTTSv2 required 3-5 GB, but the generation time was 3 times the speech timing.
I came across the updated Silero models, which are optimal for Cyrillic languages, but there are others. They have one problem (for Cyrillic, at least) — they don’t pronounce numbers and symbols. But they generate speech in a fraction of a second and consume 500MB of GPU.
I downloaded the API for Silero from the Internet and tweaked it a bit—I added morphology, i.e., the conversion of numbers to text for Russian and Ukrainian, as well as support for Wyoming.
it dont need HACS , only Wayoming record.