Wyoming TTS App/Add-On - Streaming (stream2sentence), Local voice cloning (Pocket TTS), In-stream voice/emotion switching + More

Hey everyone,

I’ve been working on a custom text-to-speech add-on for Home Assistant. My main goal wasn’t just to wrap another local AI model, but to build out the specific features that actually make TTS feel natural and highly customizable in our daily automations.

Dynamic Voice/Emotion Tagging: You can use SSML-style tags directly in your automation text to change the voice or emotion on the fly. If you send “The front door is open. [angry] Close it immediately!”, it seamlessly swaps profiles for the second sentence.

Custom Phonetic Dictionary: Tired of your smart home mispronouncing specific names or acronyms? You can define exact phonetic overrides in a simple JSON file (e.g., automatically expanding “HAOS” to “Home Assistant O S”).

Drop-in Voice Cloning: Just drop a .wav file of any voice into the HA share folder. The add-on uses a background watchdog to automatically normalize the audio volume, extract the voice state, and make the new voice instantly available to your automations without a restart.

True Streaming: It processes text into logical sentences and streams the audio back to HA as it generates, meaning there is virtually zero delay before the speaking starts.

It’s still a work in progress, but it’s at a point where I’d love to get some feedback. If anyone wants to test it out or look at the code, you can find the repository here:

Let me know what you think or if you run into any bugs!

How do I install this on a docker Home Assistant?