Atom Echo - Voice Activity Detection (VAD) continuously detects speech?

Like many others I’m new to Assist, and wanted to give the M5Stack Atom Echo a go to prove the concept. I’m running a container-based install and have openWakeWord, whisper, and piper all running and appears to be working rather well.

The quirk I’ve noticed is that my Atom Echo’s Voice Activity Detection (VAD) feature appears to be continuously detecting speech when I tail the logs using esphome. Even in a quiet room, the logs show “VAD detected speech” immediately after “Waiting for speech…” (see log file below).

Basically what this means is that to say my wake word, I have to watch the logs, and say the word shortly after this. Else I find that I’m often saying just part of the word when the device gets a timeout from openWakeWord, and my Assist debug UI is full of Assist requests with no wake word detected.

Is anyone else running into this?

Log:

./esphome logs m5stack-atom-echo.yaml
INFO ESPHome 2023.11.6
INFO Reading configuration m5stack-atom-echo.yaml...
INFO Updating https://github.com/esphome/esphome.git@pull/5230/head
INFO Starting log output from m5stack-atom-echo.local using esphome API
INFO Successfully connected to m5stack-atom-echo in 0.451s
INFO Successful handshake with m5stack-atom-echo in 0.024s
[23:24:19][I][app:102]: ESPHome version 2023.11.6 compiled on Dec 17 2023, 17:20:00
[23:24:19][I][app:104]: Project m5stack.atom-echo-voice-assistant version 1.0
[23:24:19][C][wifi:559]: WiFi:
[23:24:19][C][wifi:391]:   Local MAC: E8:6B:EA:11:2A:C4
[23:24:19][C][wifi:396]:   SSID: 'rumbledethumps'
[23:24:19][C][wifi:397]:   IP Address: 172.16.17.135
[23:24:19][C][wifi:399]:   BSSID: FC:EC:DA:B7:5A:C3
[23:24:19][C][wifi:400]:   Hostname: 'm5stack-atom-echo'
[23:24:19][C][wifi:402]:   Signal strength: -31 dB ▂▄▆█
[23:24:19][C][wifi:406]:   Channel: 11
[23:24:19][C][wifi:407]:   Subnet: 255.255.255.0
[23:24:19][C][wifi:408]:   Gateway: 172.16.17.1
[23:24:19][C][wifi:409]:   DNS1: 8.8.8.8
[23:24:19][C][wifi:410]:   DNS2: 8.8.4.4
[23:24:19][C][logger:416]: Logger:
[23:24:19][C][logger:417]:   Level: DEBUG
[23:24:19][C][logger:418]:   Log Baud Rate: 115200
[23:24:19][C][logger:420]:   Hardware UART: UART0
[23:24:19][C][esp32_rmt_led_strip:171]: ESP32 RMT LED Strip:
[23:24:19][C][esp32_rmt_led_strip:172]:   Pin: 27
[23:24:19][C][esp32_rmt_led_strip:173]:   Channel: 0
[23:24:19][C][esp32_rmt_led_strip:198]:   RGB Order: GRB
[23:24:19][C][esp32_rmt_led_strip:199]:   Max refresh rate: 0
[23:24:20][C][esp32_rmt_led_strip:200]:   Number of LEDs: 1
[23:24:20][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Button'
[23:24:20][C][gpio.binary_sensor:016]:   Pin: GPIO39
[23:24:20][C][light:103]: Light 'M5Stack Atom Echo'
[23:24:20][C][light:105]:   Default Transition Length: 0.0s
[23:24:20][C][light:106]:   Gamma Correct: 2.80
[23:24:20][C][template.switch:068]: Template Switch 'Use wake word'
[23:24:20][C][template.switch:091]:   Restore Mode: restore defaults to ON
[23:24:20][C][template.switch:057]:   Optimistic: YES
[23:24:20][C][template.switch:068]: Template Switch 'Use Listen Light'
[23:24:20][C][template.switch:091]:   Restore Mode: restore defaults to ON
[23:24:20][C][template.switch:057]:   Optimistic: YES
[23:24:20][C][factory_reset.button:011]: Factory Reset Button 'Factory reset'
[23:24:20][C][factory_reset.button:011]:   Icon: 'mdi:restart-alert'
[23:24:20][C][esp32_ble:379]: ESP32 BLE: bluetooth stack is not enabled
[23:24:20][C][esp32_ble_server:200]: ESP32 BLE Server:
[23:24:20][C][esp32_improv.component:257]: ESP32 Improv:
[23:24:20][C][mdns:115]: mDNS:
[23:24:20][C][mdns:116]:   Hostname: m5stack-atom-echo
[23:24:20][C][ota:097]: Over-The-Air Updates:
[23:24:20][C][ota:098]:   Address: m5stack-atom-echo.local:3232
[23:24:20][C][api:139]: API Server:
[23:24:20][C][api:140]:   Address: m5stack-atom-echo.local:6053
[23:24:20][C][api:144]:   Using noise encryption: NO
[23:24:20][C][improv_serial:032]: Improv Serial:
[23:24:23][D][voice_assistant:529]: Event Type: 0
[23:24:23][D][voice_assistant:529]: Event Type: 2
[23:24:23][D][voice_assistant:619]: Assist Pipeline ended
[23:24:23][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[23:24:23][D][voice_assistant:428]: Desired state set to WAITING_FOR_VAD
[23:24:23][D][voice_assistant:176]: Waiting for speech...
[23:24:23][D][voice_assistant:422]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[23:24:23][D][voice_assistant:189]: VAD detected speech
[23:24:23][D][voice_assistant:422]: State changed from WAITING_FOR_VAD to START_PIPELINE
[23:24:23][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[23:24:23][D][voice_assistant:206]: Requesting start...
[23:24:23][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[23:24:23][D][voice_assistant:443]: Client started, streaming microphone
[23:24:23][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[23:24:23][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[23:24:23][D][voice_assistant:529]: Event Type: 1
[23:24:23][D][voice_assistant:532]: Assist Pipeline running
[23:24:23][D][voice_assistant:529]: Event Type: 9
[23:24:23][D][light:036]: 'M5Stack Atom Echo' Setting:
[23:24:23][D][light:051]:   Brightness: 60%
[23:24:23][D][light:059]:   Red: 100%, Green: 89%, Blue: 71%
[23:24:28][D][voice_assistant:529]: Event Type: 0
[23:24:28][D][voice_assistant:529]: Event Type: 2
[23:24:28][D][voice_assistant:619]: Assist Pipeline ended
[23:24:28][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[23:24:28][D][voice_assistant:428]: Desired state set to WAITING_FOR_VAD
[23:24:28][D][voice_assistant:176]: Waiting for speech...
[23:24:28][D][voice_assistant:422]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[23:24:28][D][voice_assistant:189]: VAD detected speech
[23:24:28][D][voice_assistant:422]: State changed from WAITING_FOR_VAD to START_PIPELINE
[23:24:28][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[23:24:28][D][voice_assistant:206]: Requesting start...
[23:24:28][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[23:24:28][D][voice_assistant:443]: Client started, streaming microphone
[23:24:28][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[23:24:28][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[23:24:28][D][voice_assistant:529]: Event Type: 1
[23:24:28][D][voice_assistant:532]: Assist Pipeline running
[23:24:28][D][voice_assistant:529]: Event Type: 9
[23:24:28][D][light:036]: 'M5Stack Atom Echo' Setting:
[23:24:28][D][light:051]:   Brightness: 60%
[23:24:28][D][light:059]:   Red: 100%, Green: 89%, Blue: 71%
[23:24:33][D][voice_assistant:529]: Event Type: 0
[23:24:33][D][voice_assistant:529]: Event Type: 2
[23:24:33][D][voice_assistant:619]: Assist Pipeline ended
[23:24:33][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[23:24:33][D][voice_assistant:428]: Desired state set to WAITING_FOR_VAD
[23:24:33][D][voice_assistant:176]: Waiting for speech...
[23:24:33][D][voice_assistant:422]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[23:24:33][D][voice_assistant:189]: VAD detected speech
[23:24:33][D][voice_assistant:422]: State changed from WAITING_FOR_VAD to START_PIPELINE
[23:24:33][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[23:24:33][D][voice_assistant:206]: Requesting start...
[23:24:33][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[23:24:33][D][voice_assistant:443]: Client started, streaming microphone
[23:24:33][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[23:24:33][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[23:24:33][D][voice_assistant:529]: Event Type: 1
[23:24:33][D][voice_assistant:532]: Assist Pipeline running
[23:24:33][D][voice_assistant:529]: Event Type: 9
[23:24:33][D][light:036]: 'M5Stack Atom Echo' Setting:
[23:24:33][D][light:051]:   Brightness: 60%
[23:24:33][D][light:059]:   Red: 100%, Green: 89%, Blue: 71%
[23:24:35][I][ota:117]: Boot seems successful, resetting boot loop counter.
[23:24:35][D][esp32.preferences:114]: Saving 1 preferences to flash...
[23:24:35][D][esp32.preferences:143]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed
[23:24:38][D][voice_assistant:529]: Event Type: 0
[23:24:38][D][voice_assistant:529]: Event Type: 2
[23:24:38][D][voice_assistant:619]: Assist Pipeline ended
[23:24:38][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[23:24:38][D][voice_assistant:428]: Desired state set to WAITING_FOR_VAD
[23:24:38][D][voice_assistant:176]: Waiting for speech...
[23:24:38][D][voice_assistant:422]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[23:24:38][D][voice_assistant:189]: VAD detected speech
[23:24:38][D][voice_assistant:422]: State changed from WAITING_FOR_VAD to START_PIPELINE
[23:24:38][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[23:24:38][D][voice_assistant:206]: Requesting start...
[23:24:38][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[23:24:38][D][voice_assistant:443]: Client started, streaming microphone
[23:24:38][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[23:24:38][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[23:24:38][D][voice_assistant:529]: Event Type: 1
[23:24:38][D][voice_assistant:532]: Assist Pipeline running
[23:24:38][D][voice_assistant:529]: Event Type: 9
[23:24:38][D][light:036]: 'M5Stack Atom Echo' Setting:
[23:24:38][D][light:051]:   Brightness: 60%
[23:24:38][D][light:059]:   Red: 100%, Green: 89%, Blue: 71%
[23:24:43][D][voice_assistant:529]: Event Type: 0
[23:24:43][D][voice_assistant:529]: Event Type: 2
[23:24:43][D][voice_assistant:619]: Assist Pipeline ended
[23:24:43][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[23:24:43][D][voice_assistant:428]: Desired state set to WAITING_FOR_VAD
[23:24:43][D][voice_assistant:176]: Waiting for speech...
[23:24:43][D][voice_assistant:422]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[23:24:43][D][voice_assistant:189]: VAD detected speech
[23:24:43][D][voice_assistant:422]: State changed from WAITING_FOR_VAD to START_PIPELINE
[23:24:43][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[23:24:43][D][voice_assistant:206]: Requesting start...
[23:24:43][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[23:24:43][D][voice_assistant:443]: Client started, streaming microphone
[23:24:43][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[23:24:43][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[23:24:43][D][voice_assistant:529]: Event Type: 1
[23:24:43][D][voice_assistant:532]: Assist Pipeline running
[23:24:43][D][voice_assistant:529]: Event Type: 9
1 Like

I’m seeing similar behavior on a generic ESP32-WROOM-32U dev board running the configuration at My Local Voice Assistant Device With Wake Word In Home Assistant | Smart Home Circle .

This is my first hardware-assistant attempt, so I wasn’t sure if this was the intended / normal behavior (guessing not, based on the apparent reliability and usability seen in various implementations / YouTube videos).

I found your post looking for more info about the “Event Types” logged throughout the loop.

INFO ESPHome 2023.11.6
INFO Reading configuration /config/esphome/mikes-office-voice-assistant.yaml...
INFO Starting log output from mikes-office-voice-assistant.local using esphome API
WARNING Can't connect to ESPHome API for mikes-office-voice-assistant: Error connecting to ('<redacted>', 6053): [Errno 111] Connect call failed ('<redacted>', 6053) (SocketAPIError)
INFO Trying to connect to mikes-office-voice-assistant in the background
INFO Successfully connected to mikes-office-voice-assistant in 0.020s
INFO Successful handshake with mikes-office-voice-assistant in 0.113s
[18:04:11][I][app:102]: ESPHome version 2023.11.6 compiled on Dec 18 2023, 07:40:26
[18:04:11][C][wifi:559]: WiFi:
[18:04:11][C][wifi:391]:   Local MAC: <redacted>
[18:04:11][C][wifi:396]:   SSID: <redacted>[redacted]
[18:04:11][C][wifi:397]:   IP Address: <redacted>
[18:04:11][C][wifi:399]:   BSSID: <redacted>[redacted]
[18:04:11][C][wifi:400]:   Hostname: 'mikes-office-voice-assistant'
[18:04:11][C][wifi:402]:   Signal strength: -65 dB ▂▄▆█
[18:04:11][C][wifi:406]:   Channel: 6
[18:04:11][C][wifi:407]:   Subnet: 255.255.255.0
[18:04:11][C][wifi:408]:   Gateway: <redacted>
[18:04:11][C][wifi:409]:   DNS1: 8.8.8.8
[18:04:11][C][wifi:410]:   DNS2: 0.0.0.0
[18:04:11][C][logger:416]: Logger:
[18:04:11][C][logger:417]:   Level: DEBUG
[18:04:11][C][logger:418]:   Log Baud Rate: 115200
[18:04:11][C][logger:420]:   Hardware UART: UART0
[18:04:11][C][light:103]: Light 'Light'
[18:04:11][C][light:105]:   Default Transition Length: 0.5s
[18:04:11][C][light:106]:   Gamma Correct: 2.80
[18:04:11][C][template.switch:068]: Template Switch 'Use wake word'
[18:04:11][C][template.switch:091]:   Restore Mode: restore defaults to ON
[18:04:11][C][template.switch:057]:   Optimistic: YES
[18:04:11][C][status:034]: Status Binary Sensor 'API Connection'
[18:04:11][C][status:034]:   Device Class: 'connectivity'
[18:04:12][C][captive_portal:088]: Captive Portal:
[18:04:12][C][web_server:168]: Web Server:
[18:04:12][C][web_server:169]:   Address: mikes-office-voice-assistant.local:80
[18:04:12][C][mdns:115]: mDNS:
[18:04:12][C][mdns:116]:   Hostname: mikes-office-voice-assistant
[18:04:12][C][ota:097]: Over-The-Air Updates:
[18:04:12][C][ota:098]:   Address: mikes-office-voice-assistant.local:3232
[18:04:12][C][ota:101]:   Using Password.
[18:04:12][W][ota:107]: Last Boot was an unhandled reset, will proceed to safe mode in 8 restarts
[18:04:12][C][api:139]: API Server:
[18:04:12][C][api:140]:   Address: mikes-office-voice-assistant.local:6053
[18:04:12][C][api:142]:   Using noise encryption: YES
[18:04:12][C][audio:203]: Audio:
[18:04:12][C][audio:225]:   External DAC channels: 1
[18:04:12][C][audio:226]:   I2S DOUT Pin: 27
[18:04:12][D][binary_sensor:036]: 'API Connection': Sending state ON
[18:04:12][E][voice_assistant:468]: No API client connected
[18:04:12][D][voice_assistant:422]: State changed from IDLE to IDLE
[18:04:12][D][voice_assistant:428]: Desired state set to IDLE
[18:04:14][D][api:102]: Accepted <redacted>
[18:04:14][W][component:214]: Component api took a long time for an operation (0.05 s).
[18:04:14][W][component:215]: Components should block for at most 20-30ms.
[18:04:14][D][api.connection:1089]: Home Assistant 2023.12.3 (<redacted>): Connected successfully
[18:04:15][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[18:04:15][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[18:04:15][D][voice_assistant:124]: microphone not running
[18:04:15][D][voice_assistant:206]: Requesting start...
[18:04:15][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:04:15][D][voice_assistant:124]: microphone not running
[18:04:15][D][voice_assistant:124]: microphone not running
[18:04:15][D][voice_assistant:124]: microphone not running
[18:04:15][D][voice_assistant:443]: Client started, streaming microphone
[18:04:15][D][voice_assistant:422]: State changed from STARTING_PIPELINE to START_MICROPHONE
[18:04:15][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[18:04:15][D][voice_assistant:159]: Starting Microphone
[18:04:15][D][voice_assistant:422]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[18:04:15][D][voice_assistant:529]: Event Type: 1
[18:04:15][D][voice_assistant:532]: Assist Pipeline running
[18:04:15][D][voice_assistant:422]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[18:04:15][D][voice_assistant:529]: Event Type: 9
[18:04:22][D][voice_assistant:529]: Event Type: 0
[18:04:22][D][voice_assistant:529]: Event Type: 2
[18:04:22][D][voice_assistant:619]: Assist Pipeline ended
[18:04:22][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[18:04:22][D][voice_assistant:428]: Desired state set to IDLE
[18:04:22][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[18:04:22][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[18:04:22][D][light:036]: 'Light' Setting:
[18:04:22][D][light:085]:   Transition length: 0.5s
[18:04:22][D][voice_assistant:206]: Requesting start...
[18:04:22][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:04:22][D][voice_assistant:443]: Client started, streaming microphone
[18:04:22][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[18:04:22][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[18:04:22][D][voice_assistant:529]: Event Type: 1
[18:04:22][D][voice_assistant:532]: Assist Pipeline running
[18:04:22][D][voice_assistant:529]: Event Type: 9
[18:04:27][D][voice_assistant:529]: Event Type: 0
[18:04:27][D][voice_assistant:529]: Event Type: 2
[18:04:27][D][voice_assistant:619]: Assist Pipeline ended
[18:04:27][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[18:04:27][D][voice_assistant:428]: Desired state set to IDLE
[18:04:27][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[18:04:27][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[18:04:27][D][light:036]: 'Light' Setting:
[18:04:27][D][light:085]:   Transition length: 0.5s
[18:04:27][D][voice_assistant:206]: Requesting start...
[18:04:27][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:04:27][D][voice_assistant:443]: Client started, streaming microphone
[18:04:27][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[18:04:27][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[18:04:27][D][voice_assistant:529]: Event Type: 1
[18:04:27][D][voice_assistant:532]: Assist Pipeline running
[18:04:27][D][voice_assistant:529]: Event Type: 9
[18:04:32][D][voice_assistant:529]: Event Type: 0
[18:04:32][D][voice_assistant:529]: Event Type: 2
[18:04:32][D][voice_assistant:619]: Assist Pipeline ended
[18:04:32][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[18:04:32][D][voice_assistant:428]: Desired state set to IDLE
[18:04:32][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[18:04:32][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[18:04:32][D][light:036]: 'Light' Setting:
[18:04:32][D][light:085]:   Transition length: 0.5s
[18:04:32][D][voice_assistant:206]: Requesting start...
[18:04:32][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:04:32][D][voice_assistant:443]: Client started, streaming microphone
[18:04:32][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[18:04:32][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[18:04:32][D][voice_assistant:529]: Event Type: 1
[18:04:32][D][voice_assistant:532]: Assist Pipeline running
[18:04:32][D][voice_assistant:529]: Event Type: 9
[18:04:37][D][voice_assistant:529]: Event Type: 0
[18:04:37][D][voice_assistant:529]: Event Type: 2
[18:04:37][D][voice_assistant:619]: Assist Pipeline ended
[18:04:37][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[18:04:37][D][voice_assistant:428]: Desired state set to IDLE
[18:04:37][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[18:04:37][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[18:04:37][D][light:036]: 'Light' Setting:
[18:04:37][D][light:085]:   Transition length: 0.5s
[18:04:37][D][voice_assistant:206]: Requesting start...
[18:04:37][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:04:38][D][voice_assistant:443]: Client started, streaming microphone
[18:04:38][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[18:04:38][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[18:04:38][D][voice_assistant:529]: Event Type: 1
[18:04:38][D][voice_assistant:532]: Assist Pipeline running
[18:04:38][D][voice_assistant:529]: Event Type: 9
[18:04:43][D][voice_assistant:529]: Event Type: 0
[18:04:43][D][voice_assistant:529]: Event Type: 2
[18:04:43][D][voice_assistant:619]: Assist Pipeline ended
[18:04:43][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[18:04:43][D][voice_assistant:428]: Desired state set to IDLE
[18:04:43][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[18:04:43][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[18:04:43][D][light:036]: 'Light' Setting:
[18:04:43][D][light:085]:   Transition length: 0.5s
[18:04:43][D][voice_assistant:206]: Requesting start...
[18:04:43][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:04:43][D][voice_assistant:443]: Client started, streaming microphone
[18:04:43][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[18:04:43][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[18:04:43][D][voice_assistant:529]: Event Type: 1
[18:04:43][D][voice_assistant:532]: Assist Pipeline running
[18:04:43][D][voice_assistant:529]: Event Type: 9
[18:04:48][D][voice_assistant:529]: Event Type: 0
[18:04:48][D][voice_assistant:529]: Event Type: 2
[18:04:48][D][voice_assistant:619]: Assist Pipeline ended
[18:04:48][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[18:04:48][D][voice_assistant:428]: Desired state set to IDLE
[18:04:48][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[18:04:48][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[18:04:48][D][light:036]: 'Light' Setting:
[18:04:48][D][light:085]:   Transition length: 0.5s
[18:04:48][D][voice_assistant:206]: Requesting start...
[18:04:48][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:04:48][D][voice_assistant:443]: Client started, streaming microphone
[18:04:48][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[18:04:48][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[18:04:48][D][voice_assistant:529]: Event Type: 1
[18:04:48][D][voice_assistant:532]: Assist Pipeline running
[18:04:48][D][voice_assistant:529]: Event Type: 9
[18:04:53][D][voice_assistant:529]: Event Type: 0
[18:04:53][D][voice_assistant:529]: Event Type: 2
[18:04:53][D][voice_assistant:619]: Assist Pipeline ended
[18:04:53][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[18:04:53][D][voice_assistant:428]: Desired state set to IDLE
[18:04:53][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[18:04:53][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[18:04:53][D][light:036]: 'Light' Setting:
[18:04:53][D][light:085]:   Transition length: 0.5s
[18:04:53][D][voice_assistant:206]: Requesting start...
[18:04:53][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:04:53][D][voice_assistant:443]: Client started, streaming microphone
[18:04:53][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[18:04:53][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[18:04:53][D][voice_assistant:529]: Event Type: 1
[18:04:53][D][voice_assistant:532]: Assist Pipeline running
[18:04:53][D][voice_assistant:529]: Event Type: 9
[18:04:58][D][voice_assistant:529]: Event Type: 0
[18:04:58][D][voice_assistant:529]: Event Type: 2
[18:04:58][D][voice_assistant:619]: Assist Pipeline ended
[18:04:58][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[18:04:58][D][voice_assistant:428]: Desired state set to IDLE
[18:04:58][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[18:04:58][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[18:04:58][D][light:036]: 'Light' Setting:
[18:04:58][D][light:085]:   Transition length: 0.5s
[18:04:58][D][voice_assistant:206]: Requesting start...
[18:04:58][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:04:58][D][voice_assistant:443]: Client started, streaming microphone
[18:04:58][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[18:04:58][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[18:04:58][D][voice_assistant:529]: Event Type: 1
[18:04:58][D][voice_assistant:532]: Assist Pipeline running
[18:04:58][D][voice_assistant:529]: Event Type: 9
[18:05:03][D][voice_assistant:529]: Event Type: 0
[18:05:03][D][voice_assistant:529]: Event Type: 2
[18:05:03][D][voice_assistant:619]: Assist Pipeline ended
[18:05:03][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[18:05:03][D][voice_assistant:428]: Desired state set to IDLE
[18:05:03][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[18:05:03][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[18:05:03][D][light:036]: 'Light' Setting:
[18:05:03][D][light:085]:   Transition length: 0.5s
[18:05:03][D][voice_assistant:206]: Requesting start...
[18:05:03][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:05:03][D][voice_assistant:443]: Client started, streaming microphone
[18:05:03][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[18:05:03][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[18:05:03][D][voice_assistant:529]: Event Type: 1
[18:05:03][D][voice_assistant:532]: Assist Pipeline running
[18:05:03][D][voice_assistant:529]: Event Type: 9
[18:05:08][D][voice_assistant:529]: Event Type: 0
[18:05:08][D][voice_assistant:529]: Event Type: 2
[18:05:08][D][voice_assistant:619]: Assist Pipeline ended
[18:05:08][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[18:05:08][D][voice_assistant:428]: Desired state set to IDLE
[18:05:08][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[18:05:08][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[18:05:08][D][light:036]: 'Light' Setting:
[18:05:08][D][light:085]:   Transition length: 0.5s
[18:05:08][D][voice_assistant:206]: Requesting start...
[18:05:08][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:05:08][D][voice_assistant:443]: Client started, streaming microphone
[18:05:08][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[18:05:08][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[18:05:08][D][voice_assistant:529]: Event Type: 1
[18:05:08][D][voice_assistant:532]: Assist Pipeline running
[18:05:08][D][voice_assistant:529]: Event Type: 9
[18:05:13][D][voice_assistant:529]: Event Type: 0
[18:05:13][D][voice_assistant:529]: Event Type: 2
[18:05:13][D][voice_assistant:619]: Assist Pipeline ended
[18:05:13][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[18:05:13][D][voice_assistant:428]: Desired state set to IDLE
[18:05:13][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[18:05:13][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[18:05:13][D][light:036]: 'Light' Setting:
[18:05:13][D][light:085]:   Transition length: 0.5s
[18:05:13][D][voice_assistant:206]: Requesting start...
[18:05:13][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:05:13][D][voice_assistant:443]: Client started, streaming microphone
[18:05:13][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[18:05:13][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[18:05:13][D][voice_assistant:529]: Event Type: 1
[18:05:14][D][voice_assistant:532]: Assist Pipeline running
[18:05:14][D][voice_assistant:529]: Event Type: 9
[18:05:18][D][voice_assistant:529]: Event Type: 0
[18:05:19][D][voice_assistant:529]: Event Type: 2
[18:05:19][D][voice_assistant:619]: Assist Pipeline ended
[18:05:19][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to IDLE
[18:05:19][D][voice_assistant:428]: Desired state set to IDLE
[18:05:19][D][voice_assistant:422]: State changed from IDLE to START_PIPELINE
[18:05:19][D][voice_assistant:428]: Desired state set to START_MICROPHONE
[18:05:19][D][light:036]: 'Light' Setting:
[18:05:19][D][light:085]:   Transition length: 0.5s
[18:05:19][D][voice_assistant:206]: Requesting start...
[18:05:19][D][voice_assistant:422]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:05:19][D][voice_assistant:443]: Client started, streaming microphone
[18:05:19][D][voice_assistant:422]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[18:05:19][D][voice_assistant:428]: Desired state set to STREAMING_MICROPHONE
[18:05:19][D][voice_assistant:529]: Event Type: 1
[18:05:19][D][voice_assistant:532]: Assist Pipeline running
[18:05:19][D][voice_assistant:529]: Event Type: 9

Interesting - that config is quite similar to mine (which is just the stock m5stack-atom-echo config from upstream here:

One difference is that mine has the vad_threshold: 3 config which is undocumented but presumably is enabling the ESP32 to do some basic onboard silence detection and to only start streaming audio to openWakeWord when it hears a voice…leaving OWW idle most of the time. On my setup, OWW is processing continuously.

I noticed that your output doesn’t have the vad_threshold: 3 - I wonder if omitting this disables VAD entirely? I wonder what would happen if you add that to your config? I’d assume that without the ESP doing some kind of voice activity or silence detection that it would be rather normal for it to continuously stream to openWakeWord…

I raised a ticket here to ask about the config entry: Docs: voice_assistant vad_threshold configuration parameter · Issue #5228 · esphome/issues · GitHub

Unfortunately VAD does not currently function on anything other than the s3box3 (although I am waiting to test this personally) Regardless as to whether it is in the config or not. You may get lucky and have it run on a rare occasion , but certainly not consistent.

Oh interesting - good to know @robgough1970. I take it that’s a hardware capability of the s3?

I can confirm that the S3box3 has VAD that functions, having now tested. It looks like the issue with all other ESP32 variants including the S3 Dev boards is that VAD does work, however it is constantly in a state that ‘speech is detected’ even when there is silence. I have asked Jesse to look into this but he is currently away. So it will be a while until he can look into it. :slight_smile:

I am running into the EXACT same problem but with a plain ESP32-WROOM with a INMP441 microphone.

Has anyone found a solution to this???

I cant get TTS working because the unit essentially reconnects before the TTS is sent back to the ESP

Did you ever get this to work?

I posted this on another thread with zero responses:

With all the noise about “Year of the Voice” and my desire to start moving away from Google Assistants, I decided to order three Echo Atoms. It has been a challenge to say the least.
I am relatively well versed with ESPHome as I have been using ESP32s for over a year for BT Proxy and room presence. For this this application ESPHome works flawlessly.

For Voice detection and actions, very different story. Very buggy. I have tried several configurations. From barebones to all sorts of led light settings, binary sensors, switches, delay settings, etc. It simply sucks. Works OK for a while, until it doesn’t at all. I am very frustrated with the whole voice thing.

If anyone has a simple config that works, please share!

I just posted a new topic about this with a generic esp32 wroom with INMP441 mic, every five seconds the pipeline restarts. Haven’t received any replies yet, but I’ll try playing with VAD settings to see if that changes anything.

1 Like