Going all-in with Voice Assistant (help needed)

tl;dr Scroll down to the “Needs” section where I list what phrases I need and how most don’t work.

Background

By the end of this week, when all my speakers have finally come in the mail, I’ll be fully transitioning over to only using Home Assistant Voice Preview Edition after 10 years of Google Home and Alexa.

I currently use a combination of:

  1. Amazon Echo Dot for basic smart home tasks.
  2. Google Home Mini for music and broadcasts.
  3. Google Chromecast Audio for music to speakers over a 3.5mm cable.

Reasons I use both

  1. Alexa announcements broke this year because of added noise cancellation. Announcements end up getting chopped up to the point where you couldn’t understand them, or they’re mostly slience. Google Home Mini is also louder than my 2nd Gen Echo Dots.
  2. My 1st gen Google Home Mini devices are super slow for smart home tasks. I mean, for the first year of ownership, they couldn’t even give me the time. On the other hand, their ability to play music is wonderful. I wouldn’t use them if not for that feature.
  3. I like that Alexa lets me find my phone and works 99.9% of the time no matter what I ask it. It even has a feature where you can whisper to it, and it whispers back as well as the ability to talk to another Alexa device (intercom) by using the “drop-in” keyword. You can even call people’s phones when you’re still in bed!

Needs when using only Home Assistant Voice PE

Voice Assistant phrases I use just about every day:

  1. “What time is it?”
  2. “What day is it?”
  3. “What’s the temperature [outside]?”
  4. “Turn on Exhaust Fan for 10 minutes”.
  5. “Broadcast (or Announce) Dinner’s ready!”.
  6. “Drop in on Kitchen”: intercom feature.
  7. “Find my phone” feature (rings the phone by calling it or playing an alarm).
  8. “What did you do?” for recalling the last action.

It’s more rare to ask “what sound does a badger make” or “what’s 5 times 8042?”. While fun, those are effectively irrelevant because I can always use my phone. The other ones, I want immediate feedback.

Another issue, Voice Assistant always does weird stuff like this:

What it does today

:white_check_mark: 1. “What time is it?”

:no_entry: 2. “What day is it?”

This question always does some wild stuff when it runs directly on the AI without local.

:white_check_mark: 2.1. “What’s the date?”

This works, but any deviation gives weird responses again.

:no_entry: 3. “What’s the temperature [outside]?”



:no_entry: 3.1 Exposing weather entities

It appears I didn’t have any weather entities exposed, so I did that:

And it still didn’t work :cry::

:white_check_mark: 3.2. Ask for the weather

After asking for the weather directly, now I’m getting the outside temperature.

:no_entry: 4. “Turn on Exhaust Fan for 10 minutes”.

:white_check_mark: 4.1 “Turn off Exhaust Fan in 10 minutes.”

Since turning it off on a timer using the “Turn On” command didn’t work, I tried something simpler:

I’ll find out in 10 minutes. Timers like these work correctly in my testing earlier this month.

UPDATE: It worked.

It sucks to have to turn it on first and then ask it to run a command in 10 minutes rather than a 2-for-1 situation.

:no_entry: 5. “Broadcast (or Announce) Dinner’s ready!”.

It’s late, so I’ll have to do this another day. As far as I understand, this does not work.

My earlier test tonight, trying to announce to a single room, also didn’t work.

:no_entry: 6. “Drop in on Kitchen”: intercom feature.

Simply doesn’t work. There’s no voice transfer or recording functionality sadly.

Maybe Sendspin will make this possible in the future, but I’d like something working now.

:no_entry: 7. “Find my phone” feature.

It can’t tell what I’m asking.

It’s also late, so I can’t verify if I can announce to my phone. The most I know I can do is probably send a notification.

:no_entry: 8. “What did you do?” for recalling the last action.

It has no clue what I’m asking when I say this.

It’s useful to have this to check what it did if it says “yes, I did that thing you wanted” and the thing didn’t happen. Now you have to wonder what actually happened.

Conclusion

Pretty much all Voice Assistant can do for me is give me the time and “date”, not “day”.

Music Playback :+1::+1:

The other thing I needed from Voice PE is streaming music, and it does that fantastically well with Music Assistant!

While the onboard speaker is complete and utter trash, not useful unless it’s right by your ear, I’ve attached every single Voice PE to a nice speaker over 3.5mm. This also replaces all my Google Chromecast Audio devices!

I suddenly remembered some other stuff I say (I keep remembering more):

  1. "Turn off <ROOM_NAME> lights."
  2. “Play <SONG_NAME> on <SPEAKER_GROUP_NAME>.”
  3. “Set volume in <ROOM_NAME> to 30%.”

And as I said before, whispering commands to avoid it shouting “OK, TURNING ON LIGHTS IN <ROOM_NAME>” back at me. This probably won’t ever work unless a context like “user is shouting” or “user is whispering” is integrated into Home Assistant.

Wide variation depending on STT provider
Speech to Phrase STT is good but vary specific. I find that it does not take kindly to low input volume (speak loudly or be close to device). It had set dictionary and i think my TV in the background (loud) interferes and prevents it from matching.

Faster Whisper STT is good but if mistakenly woken by TV it will ramble on with its “It didnt understand … 30 seconds later…command”.

Nabu cloud was very good for some reason. no complaints but havent used in a while.

I use both Speech2phrase and faster whisper currently. Alexa and Hey Jarvis wake words enable one or other. doing this so I can test each.

Drop in feature currently does not exist.
You can directly send TTS message to a device. I use this for now but drop in is much needed. Maybe by end of 2026

I can appreciate you testing and giving a rundown on what didn’t work but… You also did not tell us what method (speech to phrase or llm) you’re using for voice. It’s NOT plug and play quite far from it.

The folks here can help you get better accuracy but we’ll need more info and honestly lower your expectation a tad - it is still not a finished product at all (preview edition) I say this because I note you bounce products when one didn’t do something you wanted. The vpe will have a lot you don’t like about it for a while Im sure.

Voice needs if you’re not using an LLM. extensive tools. Read: you need to ensure tools exist for your ask. And if they don’t write them yourself or pull blueprints. There’s a great one for music control by music assistant for instance.

If not using an LLM, speech to phrase would trip over time of day for sure. No tool.

If you are using an llm, tools and extensive grounding. Are required (see the grandma’s cardboard box problem on Fridays party - search it’ll be like hit #1) simply exposing entities is NOT enough, you also have to prompt effectively to tell the llm what to do. Without, we’ll you see the result - it’ll feel like a drunk college student. But you already found that out.

So are you using llm or speech to phrase?