Voice Assistant Capabilities

Pastas · March 2, 2025, 6:58am

Hi community,
Im excited about the Voice Assistant, mostly the local processing ability. I dont currently use any voice assistant because we’re creeped out about a device constantly listening to everything that goes on in our home. So I’m up for trying a local based Voice Assistant even if its still in its early stages.

I had a few questions for the community before I dive in this rabbit hole. Please excuse me if some of these have been discussed before or are obvious. I’m tech savvy but not hardcore… I avoid YAML if I can help it.

To get started: Do I need to buy the Voice Assistant module or can I install the software on a Synology and use my iPhone (HASS App) as microphone?
If module is necessary. Do I need the module in every room I want the voice assistant or can I use my iPhone in any part of the house?
Can the Voice Assistant be easily configured (as in doesnt need programming code modification ) to recognize claps? My wife is nostalgic about the clapper and would be great to surprise her with that functionality. A better version at least than the old 80s crappy device. Believe it or not, havent been able to get iOS or MacOS to recognize claps??.. its like the functionality is patented or something.

Anyway thanks in advice.
Cheers

jackjourneyman · March 2, 2025, 12:18pm

Well… The whole voice assistant business is not so much a rabbit hole as a vast, growing warren of intersecting burrows. My understanding is largely non-technical, but for what it’s worth (I’m sure someone will correct me)…

There are a number of stages in any voice assistant setup:

Waking up the voice assistant device with the microphone

Purpose-built voice assistants are mostly designed to be always listening, waiting for a “wake word”, but this is usually handled locally, in the device itself. If you use a phone or tablet with the HA app you should be able to turn on the microphone physically by clicking a button. Either way, this is not something that would normally be handled centrally (on a Synology server, for example).

Normally you would have a voice assistant device in or within earshot of each room, but if you use the HA app on your phone you can use it anywhere. You can do both, of course.

Someone, somewhere has probably implemented a clapper (someone, somewhere has done nearly everything), but I’ve never come across anything like it.

Issuing a command through the device

Commands are handled centrally on your Home Assistant server (which could be a Synology).

Broadly speaking, there are two approaches. You can create custom sentences and responses (intents) - which involve a bit of code writing - or you can use commands that are built in to HA.

Any device integrated with HA can be “exposed” to Assist (with a simple tickbox) and turned on and off with the built-in commands, and there is a growing list of more complex commands like “Set the heating to XX”, “Put XX on the shopping list”, which don’t need any setting up.

Speech to text

Whichever route you take (you can mix both) your spoken sentence will have to be interpreted and converted into commands that HA can obey. This can be done locally, but it is a very deep rabbit hole indeed. Most voice applications will send your command to the cloud one way or another. Voice Assistant PE is designed to do this.

Text to speech

Any responses to your commands need to be converted back to speech (even if it’s only “Sorry, I didn’t understand that”). There are several (rather robotic) local options which run on your HA server and many very realistic voices if you don’t mind connecting to a cloud service.

Responses to custom sentences can be sent to any integrated speaker; responses to built-in commands are always through the device you spoke to. If you’re happy with the built-in commands there shouldn’t be any coding involved. If you want to go down yet another rabbit hole, you can connect to an LLM service to get more varied and “realistic” responses.

Hope this is some help. Unfortunately the docs about using Assist and voice commands are even more obscure than usual. If you haven’t found it already, this is a good starting point:

Edit: A historical note, which may make things a bit clearer: Assist predates Voice Assistant PE (by a couple of years) and was designed to be used through the HA app, with text or voice commands. The whole tottering edifice is built from there…

Pastas · March 3, 2025, 4:15pm

Thanks so much for the full breakdown. Makes a lot more sense now.

So what are the use cases for the Voice Assistant module that was just released? Is it basically a chip dedicated to processing voice commands? as opposed to running Assist on the RaspPi or Synology.

jackjourneyman · March 4, 2025, 12:14am

I suppose it’s the off the shelf equivalent of Amazon’s Echo Dot - a voice assistant client device with built in good quality microphone and speaker, which will recognise a wake word then pass on your commands to Home Assistant (via Home Assistant Cloud rather than via Amazon). Assist is the overall name for the process.

With a bit more work on your part you can use other harware like the ESP32-S3-BOX instead.