Wyoming Pocket TTS - Fast Local TTS with Voice Cloning

Hey everyone! I built a Wyoming protocol server for Pocket TTS (by Kyutai) that I wanted to share.

Features:

  • ~10x realtime on CPU (no GPU needed) (depends on machine)

  • Voice cloning from 15-30 second audio samples

  • 8 preset voices included

  • Runs fully local

Requirements:

  • Voice cloning requires a free HuggingFace account and accepting the model terms

  • Preset voices work without any HF setup

GitHub

Question for the community:

I built this using Wyoming protocol since I already had that setup for my voice pipeline. However, I noticed several Wyoming repos are now archived. For those running fully local voice assistants, what’s the recommended integration method going forward? Is Wyoming still the way to go, or is there a newer/better approach for local TTS/STT integration with Home Assistant? (I am using the home assistant voice preview edition device )