ESP32-S3-Box-3 Microphone Sensitivity

Is there any way to boost the sensitivity of the built in microphones? I have to be within a foot of the device and almost shout at it to wake up. It’s nothing at all like I’ve seen in the videos put out by the HA team and other youtubers.

Thanks!

For me currently it looks like the wake-up word it’s working slightly worse than with athom echo … I’m not sure why, but I don’t think it’s the microphone sensitivity, I hope that it’s something to be improved by software …

You guys are experiencing exactly what I went through. After a month of massive frustration (I’m an audio electronics engineer) I finally threw in the towel and installed Willow+HA (heywillow.io). I’m not looking back. This is a VERY hard problem to solve and Willow solves it elegantly. Continuously Streaming audio to the HA server from remote devices is not going to cut it for reliable, local voice assistants. The Box3 devices from Espressif have specialized voice processing hardware that isn’t being used with the HA/ESPHome solution. It’s unclear if/when it will be.

You’re lucky bc you actually have Box3s. They are on back order pretty much everywhere right now.

It’s definitely the sensitivity with mine. I have no issues with the default “OK Nabu” wake word, it’s just a matter of if it hears me or not.

Within 1 foot, it works every time. Several feet away, I have to raise my voice or yell, and it usually picks up the command.

I looked around their website and didn’t really see anything regarding installing it or actually using it. If it works as well as you say, maybe someone will integrate it into HA.

NOTE: I’m not affiliated w/ Willow in any way, I just found that it is hands down the best STT engine that comes preconfigured for use w/ HA.

The local WIS Server will likely not be integrated into HA. If you want local, high-performance and highly accurate STT, you need a GPU or inference times will always be terrible. The vast Majority of HA installs are running on Pi’s and NUCs, and those will never support decent, local STT, at least not in any way that approaches the quality of Alexa or Google. Willow does have a hosted SST engine which is very good, so you don’t actually need a dedicated PC to try it out. Configuration is pretty straightforward. Install WAS, enter your HA ip address, flash your Box3 and it just works. I had it working in 15 minutes. Literally.

If you’re running HA Supervised, its just another docker container.

They have a great presence on Discord, lots of really interesting stuff, especially the Willow AutoCorrect module that just came out. It learns in realtime and corrects what it hears to what is most likely what you wanted HA to do. With a little bit of training, HA Assist get a huge boost in reliably, executing your command even when you make mistakes in how you say things.

There is now a Willow Add-on. I just wrote up this guide

1 Like

What languages does it support?

Yes, I am also interested.
For me, the priority is to use the Czech language.
If there is no support for this language, it is not usable for me.

Join their Discord channel and ask!

“we didn’t make the model or anything, we just run the Whisper models from huggingface, with some optimizations and such”

reading the Whisper docs, lots of languages supported including Czech

ok, this is just me screwing around using Google translate in Czech:

(09:02:31.392) WILLOW/HASS: sending command to Home Assistant via WebSocket: {
        "end_stage":    "intent",
        "id":   1704211351,
        "input":        {
                "text": "This is a test of the translational capabilities of this great new language engine."
        },
        "start_stage":  "intent",
        "type": "assist_pipeline/run"

or this:

“this is a test of the willow speech engine in a foreign language I could go on forever”

I (08:53:36.212) WILLOW/HASS: sending command to Home Assistant via WebSocket: {
        "end_stage":    "intent",
        "id":   1704210816,
        "input":        {
                "text": "This is a test of a speech motor in a foreign language, 
in which I could continue to infinity."
        },
        "start_stage":  "intent",
        "type": "assist_pipeline/run"

Also in many languages, like British English, Engine=Motor so, technically, its quite accurate :smile:

So, it appears that it translated Czech speech to english and then submitted it to the Assist pipeline in HA. I have an english pipeline set as default, so command are translated and submitted and executed.

I got the spoken responses to come back to the BOX3 in the Czech language by setting this in the WAS configuration screen:

which results in responses spoken in Czech!

I (09:29:25.387) WILLOW/AUDIO: Using WIS TTS URL 'https://infer.tovera.io/api/tts?force_language=cz?format=WAV&speaker=
CLB&text=Omlouvám se, ale nerozumím'

which = "I am sorry, but i do not understand"

Oh wow, thanks for this tidbit!

How do you integrate with HA?

edit: Whoops, you have more posts in this thread.

check out the VERY detailed guide I wrote up last night.