More usefull Voice Assist

For years, I’ve been experimenting to find the most practical way to automate my home. It started with a Linux machine and some custom scripts, then moved onto Domoticz, and for the past few years, I’ve been using Home Assistant.

In my view, there are only two truly practical ways for automating my/A home:

  • “Zero touch” automation
  • Voice commands

I hardly ever use the web-interface or app – it’s my measure of last resort. If I find that I need to open an app to control anything in my house, I’ve failed! It takes longer to open the app, navigate to the item I need to control, and finally use it than it does to just switch it on/off manually.

I was absolutely chuffed when (voice) Assist was introduced in Home Assistant. In my humble opinion, Voice Assist is the future of Home Assistant and home automation in general. More reason for Home Assistant to hit the ground running with this one!

The integration of (Open)AI is also pretty impressive. It gets many things so right…but admittedly has its off days now and then. :wink: I can see where this is headed though and it’s incredibly exciting!

What’s really lacking at the moment is the proper hardware for giving voice commands to Home Assistant. The ESP32 and Raspberry Pi Zero 2W solutions I’ve tested so far just don’t quite make the grade yet. There’s too much latency/delay, and the sound quality is awful at best. Especially when you compare it to “privacy-invading” Voice Assistants like Google Nest Audio, Apple Homepod, Sonos, etc.

Now onto my feature request:

Why not just use the (streaming) media players (speakers) that we already have for text-to-speech and just use an ESP32/i2S or Raspberry Pi Zero for speech-to-text? I’d like to strategically place a solid microphone array in my room and use the existing streaming speaker(s) in that room to play back the messages.

So, my feature request can be summed up as follows:

I’d love the option to separate microphone devices from speaker devices in voice assist so I can issue the command to one (microphone) device and specify to which media player device the notification/message should be played back.

Does this make sense?



I do believe the speaker: platform in the espHome yaml can be a media_player, it doesn’t have to be something in the device.

Ok, but would that no introduce yet another delay?

HA → esp32 → sonos media player

Dunno, why don’t you try?

I can’t find anything in the docs that confirms this would be possible. Seems a unlogical way to do this as well.

ESPHome Voice Assistant speech output to Home Assistant Media Player - #12 by philpo.

and the 3rd item on the configuration list here: Voice Assistant — ESPHome.

Great find, sadly there is a significant delay compared to the output to the local speaker.

I don’t believe the voice reply slows down the execution of the command, does it?

I’n not sure haven’t tested that, I tested asking for the local weather, but I would think not. It’s just that the TTS takes about 1.5 seconds longer then de actual speaker on the esp.

It does not seem like an ideal situation to me and does not solve the same issue I have with the raspberry zero 2w satellites that do local wakeword detection.