Poll: What's your biggest struggle with voice control right now?

An issue I have is trying to get the example from the release video to work.

I had posted a comment about it here but it isnā€™t getting any traction so thought Iā€™d spam it here tooā€¦ The compiler errors wanting esp-adf instead of esp-idf despite esp-adf not being a valid option (and also fails if tried)

My code is straight from the example given on gitā€¦ no idea what else to do.

EDIT: found out I had a couple of lines of code missing. All good now.

1 Like

I think the lack of follow-up ability. When you start using LLM/GPT3/4, and it asks for more info or a followup, youā€™re out of luckā€¦ (I did hear someone say to just say the wake word and answer, but not sure that works).

Really need local processing for what HA can do and have a fallback option if HA doesnā€™t know, then let another assistant pipeline workā€¦ Meaning, I want to control everything in HA locally, but also use it for conversational things.

Saying the wake word after follow up works.
What also works, " Is the hallway light on?" (answer is yes), (Wake Word) ā€œTurn it off pleaseā€

If you donā€™t mind custom components, there are solutions for this.

I made one for example:

As for the point of this thread, my second most used feature of Alexa isnā€™t available out of the box, and thatā€™s a problem for me, and thatā€™s setting timers and reminders using voice.

I have implemented timers manually, but it was a hassle to set up, and having it built in would be nice.

1 Like

My voice assistant doesnt recognize my daughter and wifeā€™s voice, only mine! It seems that female voice is a challenge for it.

1 Like

Using the voice assistant on wearOS is very hit or miss. It rarely gets anything wrong but sometimes picks up background noise like a TV. Also, it does very poorly at recognizing when the phrase has ended and continues to listen way longer then needed.

Itā€™s still a very impressive feat and looking forward to it being more polished in the future!

Check our the voice reminders blueprint I have just posted :wink:

Anyone know how to add an audio acknowledgement on successful wake word detection?

Im not always able to look at the led to see if its listening.

for example (ā€œyes sir?ā€) after triggering the wakeword.

Has anyone found a solution to send audio feedback to another player?
On esp32-s3-box3 the sound is very muffled and cuts off the first words.

My voice commands are often not understood, it works much better with the Alax. I hope this will be better in future, I wanā€™t to get Alexa out of my house.

Is anyone else seeing that they can control most all of there devices except those using the Zigbee2MQTT Add-On?

Your audio lasts 15 seconds, perhaps because HASS doesnā€™t detect the end of speech and cuts at 15 seconds of listening.

Check that your audios are not too loud, which may be the reason why the VAD doesnā€™t detect the end of speech.

1 Like

I would like to be on the positive side of the s3-box3. I am using ā€œbubbasā€ firmware, with Marissaā€™s ā€œfall back conversationā€ (HA ā†’ openai gpt3 turbo) and it really has helped. The fall back helps specifically due to openai struggling with basic tasks. The firmware also opens up the speaker instead of the stock sound level which was very quiet.

Have never been able to get it to work - ever. I suspect I am missing some critical concept or add-on.

Where can i get hold of thisā€¦ google is not playing ball

I decided to completely ditch the concept of voice control because itā€™s simply unviable. Have tried different hardware (computer microphone, headset, ESP32 I2S), different ASR backend (Whisper, vosk, rhasspy) and different language (English, Chinese) and below is the constant result. I have even tried official Amazon Echo and itā€™s having trouble distinguish ā€œOnā€ and ā€œOffā€. The conclusion for me is that voice control worth nothing and itā€™s just MUCH MUCH quicker to take out the phone, open the companion app and click a button.

2024-08-30 191026
2024-08-30 190652

@marisa - Iā€™m using BigBobbas modified ESPhome code:-

It massively improves on the basic s3 box3 code and supports a timer and exposes a media player.

Like @dza though, I find that the box3 often locks up if it doesnā€™t quite understand what you are saying.

On Device wakeword detection is pretty good, just not quite there to be usuable by the rest of the family. You have to learn how to speak to it for it to be more reliable.

I found that pushing beam size to 5 made a huge difference, but Iā€™m just guessing at the settings.

My latest issue is that my ESP32-Box has stopped responding to the wake word. I re-flashed it with the stock code from the ESPhome projects page to see if it would help, but nothing.

In the ESPhome setting page the wake word location selector is greyed out as ā€˜unavailableā€™ which seems like the only hint to what is going on.

image

Iā€™ve been trying out a basic ESP32-S3 with INFP mic setup, and the biggest frustration I have right now is the STT errors. Iā€™m curious if thereā€™s anything out there like a simplified model that only recognizes a subset of words associated with an automated home? For example, I ask Jarvis to turn on the living room lamp, and it thinks I said ā€œTurn on the learning room lightā€. The word ā€œlearningā€ will NEVER be used when I am vice controlling things around the house, so Iā€™d love to not even have that as an option for the Whisper model to choose from.

Of course, with a general LLM, you need the full breadth if youā€™re going to ask it random questions, but that could be its own pipeline.

*edit Vosk STT seems like a pretty good option for limiting what can be recognized.