One week of Home Assistant voice: a pretty good start

TheLastProject · January 10, 2025, 6:01pm

I’ve been using Home Assistant Voice Preview Edition for about a week now and I feel like documenting my journey, for others who are interested in if it’s worth picking up. tl;dr: I think it’s kinda cool, but definitely nerd-only right now.

Let’s start at the start, which is also the most painful part.

The onboarding

The onboarding was… honestly somewhat painful, but I chalk this up to it being the “Preview Edition”. After manually flashing a firmware update, it connected to my Home Assistant instance immediately.

The second step was kinda worse: Home Assistant gave me two options: subscribe to Home Assistant Cloud (no! I want a local voice assistant) or run it locally. So, obviously, I picked locally, only to be greeted by an alert that installing the add-ons wasn’t supported on my system (Home Assistant Container).

While that was somewhat sad, I do totally understand that, not being able to install add-ons is a trade-off that is documented fairly well fairly well on the installation page, so I was wholly unsurprised this would happen. What I was surprised about, however, was how little documentation there was about setting stuff up manually.

After scouring the web, I figured out that the voice recognition and text-to-speech was most likely provided by rhasspy/wyoming-whisper and rhasspy/wyoming-piper. This should definitely be documented more clearly!

I first set up wyoming-whisper using the Wyoming Integration and decided to opt for PicoTTS for text-to-speech as I am not really bothered by “robotic” voices. Sadly, due to HA bug 93456, PicoTTS is completely unusable to use with Home Assistant Voice as it doesn’t actually speak anything but gibberish. So I ended up setting up wyoming-piper too.

The actual usage

I mean, what can I say, it kinda works!

I’ve used Google Assistant 5+ years ago on my phone a bit and had it not understand me most of the the time. I used Mycroft for quite a while, and its recognition is also not too great. The voice recognition with wyoming-whisper seems to be a bit better than those two for me, but still not all too great. Most of the time it responds to “Okay Nabu”, but far from always (I’d say only around 80% of the time for me). Must be the weird Dutch accent

However, Home Assistant has one big benefit over all the other voice assistants: the ability to write your own automations. This allows you to make the triggering a bit more “fuzzy”. Instead of “Play music”, I also added “Clay music”, “Blade music”, “Playing music”, making the recognition work pretty well for me. However, this kind of “fuzzy matching” is quite limited and there is one huge downside: I can’t limit the words it recognizes in any way. No, Home Assistant, I did not say “Play Ephinescence” (is that even a word? I can’t find it in any dictionary…)

However, for this I also could write a decent-ish workaround: if I spell the artist name, it is registered as “E-V-A-N-E-S-C-E-N-C-E”. So, just use a bit of YAML in your music integration (I use Music Assistant, which isn’t perfect but works well enough for my use cases) and you’re good to go. Here’s what I use to remove the dashes from the input, allowing me to search for some fairly obscure things: "{{ trigger.slots.search_query | regex_replace('-', '') }}".

However, I do really wish I could “dynamically pre-load” my list of artists into it, so it would try to match search queries to the artists in my collection, hopefully making it understand I meant “Evanescence” instead of “Ephinescence”.

Summary

The Home Assistant Voice Preview Edition is far from perfect. I definitely would not recommend this to non-technical users at this point. However, the openness and flexibility of Home Assistant allows you to somewhat work around the voice recognition issues. Still, I wish it was a bit more fuzzy in general so I would have to write less automations. I don’t have a “lifting room”, so when it thinks I said “Turn on lifting room light”, it should be able to understand I actually said “Turn on living room light”.

If you’re a nerd willing to mess around with automations to get the best experience out of it and have decent enough hardware to run wyoming-whisper and wyoming-piper, go for it! If not, well, just keep waiting. Home Assistant improves every month, I’m quite certain Home Assistant Voice will keep improving at an amazing rate too

jackjourneyman · January 10, 2025, 6:28pm

Thanks for the write up.

I ran into the PicoTTS bug too. For anyone who is interested, sentences of less than 16 characters and more than 32 characters work OK, anything between sounds as if it’s reading out code.

A work-around is to pad out the sentence with punctuation so that it is more than 32 characters long. I do all my TTS by calling a script which adds

"............................"

to the end.

The problem has been around for a long time and it’s a shame that it hasn’t been fixed - with all the attention being given to voice at the moment, PicoTTS is a perfect, easily installed way to experiment.