πŸ”Š Two New TTS Custom Integrations: Hume AI & Inworld AI

Hey everyone! :wave:

I’ve been working on two custom TTS integrations that bring next-generation AI voices to Home Assistant, and I’m excited to share them with the community!

Hume AI TTS

GitHub

Hume AI is the only TTS provider that synthesizes emotionally expressive speech β€” voices that actually convey tone, warmth, excitement, and nuance. Perfect for voice assistants that feel genuinely human.

Highlights:

  • :performing_arts: Emotionally aware, expressive voice synthesis
  • :studio_microphone: Large library of preset voices + support for your own custom voices
  • :earth_africa: 11 supported languages (EN, ES, JA, KO, FR, PT, IT, DE, RU, HI, AR)
  • :electric_plug: Full voice assistant pipeline integration β€” pick your voice directly in the pipeline UI

Inworld AI TTS

GitHub

Inworld AI brings state-of-the-art TTS with their Llama-based TTS 1.5 model family β€” ultra-high quality voices with blazing speed.

Highlights:

  • :rocket: Choose between two models: TTS 1.5 Max (best quality) or TTS 1.5 Mini (ultra-fast & cost-efficient)
  • :studio_microphone: Wide voice library selectable directly from the voice assistant pipeline UI
  • :earth_africa: 15 supported languages (EN, ZH, JA, KO, RU, IT, ES, PT, FR, DE, PL, NL, HI, HE, AR)
  • :electric_plug: Full voice assistant pipeline integration

:pray: Feedback welcome!

These are actively maintained. If you run into issues or have feature requests, feel free to open an issue on GitHub. And if you find them useful, a :star: on the repos goes a long way!

Happy automating! :house::sparkles:

Recently released a new version for hume ai that fixes a bug for long texts and allow to set up the emotions. Check the repository readme for more details

:tada: New Release: Hume AI TTS now supports Real-Time Streaming!

The latest version of the Hume AI TTS integration brings a game-changing improvement for voice assistant satellites: real-time audio streaming.

Previously, the integration had to wait for the entire text to be synthesized before playback could begin. Now, audio starts playing immediately as the first sentences are ready β€” while the rest is still being generated in the background.

What this means in practice:

  • :zap: Drastically reduced response time on your voice satellites
  • :speaking_head: Conversations feel much more natural and fluid
  • :arrows_counterclockwise: Sentences are streamed as the LLM generates them, with no waiting for the full response

If you’re using Hume AI TTS with a voice assistant pipeline (e.g. with Wyoming or any HA-compatible satellite), just update the integration and enjoy the difference immediately β€” no configuration changes needed.

Give it a try and let me know what you think! :rocket: