ESP32-S3 Box voice activation / wakeword detection not working

I set up my ESP32-S3 Box with this tutorial:
ESP32-S3-BOX-3 voice assistant - Home Assistant.
I also set up whisper, piper and openwakeword addons as described in the tutorial.

Everything looks fine and the S3 Box shows the HA house with the dark face.
Except nothing happens when i say the wakeword(s).

Log of S3 Box says

...
[11:57:35][C][mdns:115]: mDNS:
[11:57:35][C][mdns:116]:   Hostname: esp32-s3-box-e22704
[11:57:35][C][ota:097]: Over-The-Air Updates:
[11:57:35][C][ota:098]:   Address: esp32-s3-box-e22704.local:3232
[11:57:35][C][api:139]: API Server:
[11:57:35][C][api:140]:   Address: esp32-s3-box-e22704.local:6053
[11:57:35][C][api:142]:   Using noise encryption: YES
[11:57:35][C][improv_serial:032]: Improv Serial:

and nothing happens after that.

When i debug the Assistant configuration it is just empty and no runs are visible.
I already tried “Ok Nabu” and “Alexa” wake words as well as English and German as languages.

In the video in the tutorial the face on the S3 display turns white when the wakeword is detected - i never saw this happening on my device.

Something suspicious i see is that the entity “Use wakeword” of the ESP32 device is not available - every other entity looks fine.

Anyone having an idea what i can try to find the issue?

1 Like

I’m having the identical issue, and I’ll be sure to come back to provide a solution if I ever find one.

OK, so I have a partial answer which I’m sharing in case it’s helpful. I got a detailed response on reddit that explained that wake word was renamed in 2023.12.2, even though it still shows in yaml (and on previous devices). So, you should no longer see wake word as an available option in the configuration.

I hope that helps a little.

I see the changes but i don’t this is the reason for the device not hearing the wake word.
I can mute it or switch the backlight LED and i see that the diagnostics → mute entity has the state “off”.
Also the device has a hardware button for muting the microphone which is turned off (not muted) as well.

After adopting the box again the log messages changed a bit but still no wake word detection:

[07:58:18][D][esp-idf:000]: I (28599) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8205311 Bytes, Inter:54352 Bytes, Dram:54352 Bytes
[07:58:18][D][esp-idf:000]: I (28604) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1
[07:58:18][D][esp-idf:000]: W (28608) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_RESUME
[07:58:18][D][esp-idf:000]: E (28611) AUDIO_PIPELINE: audio_pipeline_resume failed
[07:58:20][D][esp-idf:000]: W (30618) AUDIO_ELEMENT: [i2s-0x3d836bd0] Element task destroy timeout[2000]
[07:58:20][D][esp-idf:000]: W (30622) AUDIO_ELEMENT: [filter]  Element has not create when AUDIO_ELEMENT_TERMINATE
[07:58:20][D][esp_adf.microphone:294]: Microphone started

Which version of the box do you have? S3 Box or S3 Box 3? I have the S3 Box 3 and the ESPHome firmware works for me.

Here are my ESPHome logs:

[10:22:17][D][voice_assistant:422]: State changed from IDLE to START_MICROPHONE
[10:22:17][D][voice_assistant:428]: Desired state set to WAIT_FOR_VAD
[10:22:17][D][voice_assistant:159]: Starting Microphone
[10:22:17][D][voice_assistant:422]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[10:22:17][D][esp-idf:000]: I (65471) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[10:22:17][D][esp-idf:000]: I (65475) I2S: I2S0, MCLK output by GPIO2

[10:22:17][D][esp-idf:000]: I (65479) AUDIO_PIPELINE: link el->rb, el:0x3d036c54, tag:i2s, rb:0x3d037068

[10:22:17][D][esp-idf:000]: I (65483) AUDIO_PIPELINE: link el->rb, el:0x3d036dc8, tag:filter, rb:0x3d0390a8

[10:22:17][D][esp-idf:000]: I (65488) AUDIO_ELEMENT: [i2s-0x3d036c54] Element task created

[10:22:17][D][esp-idf:000]: I (65490) AUDIO_THREAD: The filter task allocate stack on external memory

[10:22:17][D][esp-idf:000]: I (65495) AUDIO_ELEMENT: [filter-0x3d036dc8] Element task created

[10:22:17][D][esp-idf:000]: I (65498) AUDIO_ELEMENT: [raw-0x3d036ef8] Element task created

[10:22:17][D][esp-idf:000]: I (65503) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16592283 Bytes, Inter:56948 Bytes, Dram:56948 Bytes


[10:22:17][D][esp-idf:000]: I (65508) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1

[10:22:17][D][esp-idf:000]: I (65510) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1

[10:22:17][D][esp-idf:000]: I (65515) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1

[10:22:17][D][esp-idf:000]: I (65517) AUDIO_PIPELINE: Pipeline started

[10:22:17][D][esp_adf.microphone:294]: Microphone started
[10:22:17][D][voice_assistant:422]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD
[10:22:17][D][voice_assistant:176]: Waiting for speech...
[10:22:17][D][voice_assistant:422]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD

Regular S3 Box.

I will have a look at your Log. Thanks

hm… it seems, that I have the same / or similar issues either on my S3-Box and my S3-Box3…

[D][voice_assistant:189]: VAD detected speech
[D][voice_assistant:422]: State changed from WAITING_FOR_VAD to START_PIPELINE
[D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[D][voice_assistant:206]: Requesting start...
[D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[D][voice_assistant:443]: Client started, streaming microphone
[D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[D][voice_assistant:529]: Event Type: 1
[D][voice_assistant:532]: Assist Pipeline running
[D][voice_assistant:529]: Event Type: 9
[D][voice_assistant:529]: Event Type: 0
[D][voice_assistant:529]: Event Type: 2
[D][voice_assistant:619]: Assist Pipeline ended
[D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[D][voice_assistant:428]: Desired state set to WAITING_FOR_VAD
[D][voice_assistant:176]: Waiting for speech...
[D][voice_assistant:422]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[D][voice_assistant:189]: VAD detected speech
[D][voice_assistant:422]: State changed from WAITING_FOR_VAD to START_PIPELINE
[D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[D][voice_assistant:206]: Requesting start...
[D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[D][voice_assistant:443]: Client started, streaming microphone
[D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[D][voice_assistant:529]: Event Type: 1
[D][voice_assistant:532]: Assist Pipeline running
[D][voice_assistant:529]: Event Type: 9
[D][voice_assistant:529]: Event Type: 0
[D][voice_assistant:529]: Event Type: 2
[D][voice_assistant:619]: Assist Pipeline ended
[D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[D][voice_assistant:428]: Desired state set to WAITING_FOR_VAD
[D][voice_assistant:176]: Waiting for speech...
[D][voice_assistant:422]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[D][voice_assistant:189]: VAD detected speech
[D][voice_assistant:422]: State changed from WAITING_FOR_VAD to START_PIPELINE
[D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[D][voice_assistant:206]: Requesting start...
[D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[D][voice_assistant:443]: Client started, streaming microphone
[D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[D][voice_assistant:529]: Event Type: 1
[D][voice_assistant:532]: Assist Pipeline running
[D][voice_assistant:529]: Event Type: 9
[D][voice_assistant:529]: Event Type: 0
[D][voice_assistant:529]: Event Type: 2
[D][voice_assistant:619]: Assist Pipeline ended
[D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[D][voice_assistant:428]: Desired state set to WAITING_FOR_VAD
[D][voice_assistant:176]: Waiting for speech...
[D][voice_assistant:422]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD

but nothing happens… the Screen does not change, nor does the system recognize anything

I just got an ESP32-S3-Box3 and and Atom Echo. I had things set up originally using the openwakeword container that is packaged with HA. I got no responses to the wake word on either device so I set up a remote copy of openwakeword on a separate machine and then tied it into the HA voice processing pipeline. It also was failing to respond. Using Atom Echo results in a constant stream of packets being sent over to openwakeword, but “ok nabu” would rarely be detected. I thought I’d try alexa as the wake word as it seems to be a simpler. Must to my delight I got pretty good wake word recognition while using the Atom Echo. So then I figured I’d try the S3 box. First thing I notice was it doesn’t send the constant stream of packets. The S3 box has code that tries to determine when speech is going on and then starts the flow of packets. Not all words worked well to wake up the mic. Hello seemed to work good word to get things flowing. After saying hello I had to wait a second or two before saying alexa. Doing this gave me good results. The negative things is there is no indication on the S3 box that the mic is active. Sometimes saying hello didn’t wake the mic. The S3 box would also get in a state where nothing would wake the mic, other than a reboot of the S3.

I followed this howto:

Once the the device was flashed and the steps completed from this part:

I went back to HA->ESPhome and adopted the new device. Once this was done it was hosed. I did it again but did not adobt it in ESPHome and it works fine. Just go through all the steps on the ESPHome project site and it should work.

Did you ever find a solution to this? I’m having an identical problems with both m5-stack echos & a S3-Box-3. Devices are recognized in ESPHome, but none of the wake words are triggering a response.

No, sadly I did not yet find any permanent solution for this.

Ever found a fix?