Make your Home Assistant sound like YOU – Local Voice Cloning & Streaming for Mac 🍎

Hi everyone,

I’ve been working on a Wyoming bridge for VoxCPM that runs specifically on Apple Silicon (M1/M2/M3/M4). I wanted a way to use the Apple Neural Engine (ANE) for higher-quality TTS without putting a heavy load on the CPU or needing a dedicated GPU.

Project Details:

  • Native Streaming: It supports Wyoming’s SynthesizeChunk events, so audio starts playing as it’s being generated rather than waiting for the full sentence.
  • Zero-Shot Cloning: You can use a short reference .wav file to clone a voice. The bridge detects these files and exposes them as selectable voices in Home Assistant.
  • Hardware: Tested on M-series Mac hardware. It uses a Python bridge to communicate with an ANE-optimized server.
  • Protocol: Uses the standard Wyoming protocol, so it integrates directly with Assist.

If you have an M-series Mac acting as a home server, I’d be interested to hear what kind of generation speeds (RTF) you’re seeing.

GitHub: https://github.com/vpsh-code/ANE_VOXCPM_Homeassistant


Suggested README Section for Voice Cloning

:studio_microphone: Voice Cloning

The server supports zero-shot cloning using local reference clips.