Poll: What's your biggest struggle with voice control right now?

Hi All,

First of all, it is really great to have the voice control in HA! But there are some thing I do not like:
First is the absence of audible feedback of wake word detection. If my Atom Echo is out of sight, or on a sunny day it is hard to understand if the wake word has been detected. It would be a really very important feature! Sending some triggered by the wake word detection response to a continuously working media player is not an option. That is because not everyone has one, and the confirmation would be detected by the satellite (Atom Echo in my case) as a command.
Second is the occasional stability of the voice system in general. Despite I am trying to speak clearly (in Hungarian), wake word is not always detected. Also, I get “I did not understand that” response too frequently. I have two satellites. The first one (installed as first) usually works much better than the second. The second one is frequently not working at all. After wake word detection (several trials) whatever I say, it will not understand. Than, it starts to understand, but flashes slowly 10-15 sec before the command is actually executed. So, the second, identical to the first satellite is very unstable despite using HA Cloud.
The third thing is about custom triggering sentences for automations. These are working, usually. But after detecting the command and starting the requested automation, the Atom Echo flashes fast 10-15 sec or more. Only after that the confirmation (“Done”) comes and the satellite returns to listening state.
In general, I like this voice assistant very much, but there is still a lot to improve. (In Hungarian it still does not understand anything except “turn on” and “turn off”)

It seems that I managed to solve the second problem I mentioned:
The second Atom Echo was put at a place, where it could detect voice not loudly enough. Due to that, there was no sharp, clean boundary between the speech and the silence. That is why, not detecting properly the end of the voice activity, it waited until the STT cycle timed out (~15 sec). My flat is a quiet place and I wanted my voice being detected more precisely. Therefore, I added some tweaked parameters to the configuration of the Atom Echo devices using the ESP Home extension. Those parameters are overriding ones downloaded from the GitHub project (Line #5).
I changed noise suppression level to 1 (quiet place) and the volume multiplier to 5 (stronger voice recorded). With these parameters the time-out is avoided and assistant acts and replies almost promptly.
My configuration:

substitutions:
  name: m5stack-atom-echo-0f8d14
  friendly_name: HĂĄlĂł asszisztens
packages:
  m5stack.atom-echo-voice-assistant: github://esphome/firmware/voice-assistant/m5stack-atom-echo.yaml@main
esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}
api:
  encryption:
    key: <my-key-replaced>


wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

voice_assistant:
   noise_suppression_level: 1
   auto_gain: 31dBFS
   volume_multiplier: 5.0
   vad_threshold: 3


1 Like

Thank you very much. Vosk works great in Spanish. The small model is really fast in rpi4 and quite accurate, but the big model still works fine in rpi4 (1,5 s in STT) and is really accurate. And it just consume basically ram (like 4Gb RAM, which is ok in my 8Gb ram RPI) but not much CPU.
At least I can test the year of the voice.

1 Like

An issue I have is trying to get the example from the release video to work.

I had posted a comment about it here but it isn’t getting any traction so thought I’d spam it here too
 The compiler errors wanting esp-adf instead of esp-idf despite esp-adf not being a valid option (and also fails if tried)

My code is straight from the example given on git
 no idea what else to do.

EDIT: found out I had a couple of lines of code missing. All good now.

1 Like

I think the lack of follow-up ability. When you start using LLM/GPT3/4, and it asks for more info or a followup, you’re out of luck
 (I did hear someone say to just say the wake word and answer, but not sure that works).

Really need local processing for what HA can do and have a fallback option if HA doesn’t know, then let another assistant pipeline work
 Meaning, I want to control everything in HA locally, but also use it for conversational things.

Saying the wake word after follow up works.
What also works, " Is the hallway light on?" (answer is yes), (Wake Word) “Turn it off please”

If you don’t mind custom components, there are solutions for this.

I made one for example:

As for the point of this thread, my second most used feature of Alexa isn’t available out of the box, and that’s a problem for me, and that’s setting timers and reminders using voice.

I have implemented timers manually, but it was a hassle to set up, and having it built in would be nice.

My voice assistant doesnt recognize my daughter and wife’s voice, only mine! It seems that female voice is a challenge for it.

1 Like

Using the voice assistant on wearOS is very hit or miss. It rarely gets anything wrong but sometimes picks up background noise like a TV. Also, it does very poorly at recognizing when the phrase has ended and continues to listen way longer then needed.

It’s still a very impressive feat and looking forward to it being more polished in the future!

Check our the voice reminders blueprint I have just posted :wink:

Anyone know how to add an audio acknowledgement on successful wake word detection?

Im not always able to look at the led to see if its listening.

for example (“yes sir?”) after triggering the wakeword.

Has anyone found a solution to send audio feedback to another player?
On esp32-s3-box3 the sound is very muffled and cuts off the first words.

My voice commands are often not understood, it works much better with the Alax. I hope this will be better in future, I wan’t to get Alexa out of my house.

Is anyone else seeing that they can control most all of there devices except those using the Zigbee2MQTT Add-On?