What do I need to get a device working with HAs built in voice assist and LLM?

I’ve set up HA and its assist feature with an LLM to process my commands and reply accordingly. For now it works only with the mobile app on my phone but my goal is to have a device like an Amazon Echo at home so I can just talk to the HA assist through it, taking advantage of the LLM personality, the voices, etc.

I did some googling but I’m not sure what I really need to achieve this. Some say it’s not possible with an Echo, some say it’s possible with some Echos. I know there’s the HA voice preview edition but it has mixed reviews tending towards negative. Are there other options? Basically, all I need is a WiFi speaker and mic connected to HA, no?

Time to read the docs I guess, especially the bit about voice assistants.

But yes you can do anything you want, use google, alexa , VPE, or even make your own using esp32.

You will need to ask more questions once you have read the docs.

no. You cannot use assist pipleline for Amazon echo.
You can send announcement to echo
you can setup echo to control HA
It will not however use the built in voice assistant for this.

look for esp powered device with mic and speaker.

HAVPE is the best supported device

Atom Echo is another choice with decent support but poor audio

ESP S3 AI is my favorite after HAVPE because good price and OK audio but not good support

I have x3 HAVPE and 6 of the ESP S3 AI
HAVPE are much better
I have atom echos but stop using them.

by support I mean when there is update or new feature Nabu (HA creators) will update the firmware to support the new feature and updates are controlled through HA ui or you may control yourself.

All other devices are self managed and you depend on community new feature added and it not supported easily.

I am considering converting my atom echo to use with external powered speaker unit so that the echo mic will continue working and I will output audio to amp/speaker for better audio.

Just to be sure I get it right:

The ESP S3 AI basically is a speaker and a mic. It forwards my commands to HA running on my NUC and outputs the audio coming from HA. The whole processing is done on the NUC/the Cloud?

Edit: I see there are different ESP32-S2 devices with or without screens. What are the keywords I need to look for? WiFi, BLE, audio output, audio capture? Anything else?

Would something like this work?

yes. It is similar to HA voice PE. esp32 device with built in speaker / amp / mic.

If you setup local HA voice assistant pipeline (STT / TTS / Wakeword) it will use that and only that if you flash it with esphome firmeware. With this it will be completely local.

I have esphome yaml for this device that supports latest esphome version.

Yes but

You can look in forum. There are already devices with screen that people have done work to get functioning with esphome. Getting devices working with esphome can be straightforward but when it come to i2s and display it gets a little complicated. Unless you are programmer or want to learn I suggest getting a device that is confirmed, tested and someone already made yaml for.

Also the device you list look tiny and speaker and mic will likely be poor due to size. When looking at voice, speaker will matter. mic matter as well but they tend to be less of an issue. I would also consider form factor. The device you list will not sit on table or other location well

echo ear look OK

esp-s3-box is another. I have this but audio is less than poor

my current belief is that voice/mic and display should be seperate. This is based on current device availability. This will change by June 2026 I expect and more / better devices will be available. for now I suggest you do either / or but not both.

There are some esp display with speaker (no mic) but I expect audio is poor maybe but again, this is why if you are looking for alexa like device you should get that (esp, speaker, mic, no display) for now.

Thanks for the resources. The EchoEar looks cute.

I plan to have some kind of dedicated touch display device in the future but you’re right, 3.5“ is pretty small. I guess I’ll stick with the S3 AI for now and stick to my initial plan of getting a cheapish mini tablet later. Thanks!

I’ve got another question you might help me with. When I tried voice assist on my phone I noticed it’s very sensitive for ambient noise. With the TV running at normal volume and distance it always picked up speech from the TV. I could only get it to work when I muted the tv completely. Is this working better with a different device? Are there some settings I could tweak?

Nope. Same with all devices

Background + your speech = input

The speech to text provider chosen may distinguish background from speech better than another

Whisper is like ai take voice >> convert to text >> then like ai determine what you may want.

Speech to phrase take voice >> convert to text >> then match against dictionary of predefined command.

They both have plus and minus but ultimately not perfect but improving.

It took Amazon over 10year to get where it is and now it use a lot of ai and knowledge from millions of voices saying same command and training on that. HA is on year 2 with less funding and data. Gotta give it patience.

1 Like