Help requested: voice assistant - got one successful response only

I’ve spent few hours watching YouTube videos, reading articles, assembling my first prototype, and now testing it. About 10 mins ago, I got my first response from Nabu! I can’t get a repeat however.

Here is my hardware:
ESP WROOM 32
INMP 441
MAX 98357A
4 ohm speaker
24 LED light ring

Everything is mounted on a circuit board for testing.

Here is my code

esphome:
  name: esphome-voice-main-bedroom
  friendly_name: Voice main bedroom
  on_boot:
     - priority: -100
       then:
         - wait_until: api.connected
         - delay: 1s
         - if:
             condition:
               switch.is_on: use_wake_word
             then:
               - voice_assistant.start_continuous:

esp32:
  board: esp32dev
  framework:
    type: esp-idf
    version: recommended

# Enable logging
logger:

# Enable Home Assistant API
api:

ota:


wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  manual_ip:
    static_ip: 192.168.0.220
    gateway: 192.168.0.1
    subnet: 255.255.255.0

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Esphome-voice-main-bedroom"
    password: "" # VGiTqrtnA52n


i2s_audio:
  i2s_lrclk_pin: GPIO27
  i2s_bclk_pin: GPIO22

microphone:
  - platform: i2s_audio
    id: mic_i2s
    adc_type: external
    i2s_din_pin: GPIO21
    pdm: false

speaker:
  - platform: i2s_audio
    id: speaker_i2s
    dac_type: external
    i2s_dout_pin: GPIO18
    mode: mono

voice_assistant:
  microphone: mic_i2s
  speaker: speaker_i2s
  use_wake_word: true
  noise_suppression_level: 4
  auto_gain: 15dBFS
  volume_multiplier: 1.0
  id: assist
  on_end:
  - light.turn_off:
      id: led_ring
  on_wake_word_detected:
  - light.addressable_set:
      id: led_ring
      range_from: 16
      range_to: 17
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 15
      range_to: 18
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 14
      range_to: 19
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 13
      range_to: 20
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 12
      range_to: 21
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 11
      range_to: 22
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 10
      range_to: 23
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 9
      range_to: 1
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 9
      range_to: 24
      red: 0%
      green: 0%
      blue: 100%
  - delay: 0.03s
  - light.addressable_set:
      id: led_ring
      range_from: 7
      range_to: 8
      red: 0%
      green: 50%
      blue: 50%
  - light.addressable_set:
      id: led_ring
      range_from: 0
      range_to: 1
      red: 0%
      green: 50%
      blue: 50%
  - delay: 0.1s
  - light.addressable_set:
      id: led_ring
      range_from: 2
      range_to: 6
      red: 0%
      green: 100%
      blue: 00%

switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(assist).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(assist).set_use_wake_word(false);

light:
  - platform: esp32_rmt_led_strip
    rgb_order: GRB
    pin: GPIO23
    num_leds: 24
    rmt_channel: 0
    chipset: ws2812
    name: "led_ring"
    id: led_ring

captive_portal:

Here are the logs from around the time of that successful test:

[08:12:31][D][voice_assistant:200]: Requesting start...
[08:12:31][D][voice_assistant:412]: State changed from START_PIPELINE to STARTING_PIPELINE
[08:12:31][D][voice_assistant:433]: Client started, streaming microphone
[08:12:31][D][voice_assistant:412]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[08:12:31][D][voice_assistant:418]: Desired state set to STREAMING_MICROPHONE
[08:12:31][D][voice_assistant:519]: Event Type: 1
[08:12:31][D][voice_assistant:522]: Assist Pipeline running
[08:12:31][D][voice_assistant:519]: Event Type: 9
[08:12:37][D][voice_assistant:519]: Event Type: 0
[08:12:37][D][voice_assistant:519]: Event Type: 2
[08:12:37][D][voice_assistant:609]: Assist Pipeline ended
[08:12:37][D][voice_assistant:412]: State changed from STREAMING_MICROPHONE to IDLE
[08:12:37][D][voice_assistant:418]: Desired state set to IDLE
[08:12:37][D][voice_assistant:412]: State changed from IDLE to START_PIPELINE
[08:12:37][D][voice_assistant:418]: Desired state set to START_MICROPHONE
[08:12:37][D][light:036]: 'led_ring' Setting:
[08:12:37][D][light:085]:   Transition length: 1.0s
[08:12:37][D][voice_assistant:200]: Requesting start...
[08:12:37][D][voice_assistant:412]: State changed from START_PIPELINE to STARTING_PIPELINE
[08:12:37][D][voice_assistant:433]: Client started, streaming microphone
[08:12:37][D][voice_assistant:412]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[08:12:37][D][voice_assistant:418]: Desired state set to STREAMING_MICROPHONE
[08:12:37][D][voice_assistant:519]: Event Type: 1
[08:12:37][D][voice_assistant:522]: Assist Pipeline running
[08:12:37][D][voice_assistant:519]: Event Type: 9
[08:12:40][D][voice_assistant:519]: Event Type: 10
[08:12:40][D][voice_assistant:528]: Wake word detected
[08:12:40][D][voice_assistant:519]: Event Type: 3
[08:12:40][D][voice_assistant:533]: STT started
[08:12:41][D][voice_assistant:519]: Event Type: 11
[08:12:41][D][voice_assistant:670]: Starting STT by VAD
[08:12:41][D][voice_assistant:519]: Event Type: 12
[08:12:41][D][voice_assistant:674]: STT by VAD end
[08:12:41][D][voice_assistant:412]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[08:12:41][D][voice_assistant:418]: Desired state set to AWAITING_RESPONSE
[08:12:41][D][voice_assistant:412]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[08:12:41][D][esp-idf:000]: I (37184) I2S: DMA queue destroyed

[08:12:41][D][voice_assistant:412]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[08:12:42][D][voice_assistant:519]: Event Type: 4
[08:12:42][D][voice_assistant:547]: Speech recognised as: "Look like."
[08:12:42][D][voice_assistant:519]: Event Type: 5
[08:12:42][D][voice_assistant:552]: Intent started
[08:12:42][D][voice_assistant:519]: Event Type: 6
[08:12:42][D][voice_assistant:519]: Event Type: 7
[08:12:42][D][voice_assistant:575]: Response: "Sorry, I couldn't understand that"
[08:12:42][D][voice_assistant:519]: Event Type: 8
[08:12:42][D][voice_assistant:595]: Response URL: "http://192.168.0.124:8123/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-ca_dba8942832_cloud.wav"
[08:12:42][D][voice_assistant:412]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[08:12:42][D][voice_assistant:418]: Desired state set to STREAMING_RESPONSE
[08:12:42][D][esp-idf:000]: I (38181) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[08:12:42][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:12:43][D][esp-idf:000]: I (38288) I2S: DMA queue destroyed

[08:12:43][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:12:43][D][voice_assistant:519]: Event Type: 98
[08:12:43][D][voice_assistant:657]: TTS stream start
[08:12:43][D][esp-idf:000]: I (38734) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[08:12:43][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:12:43][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[08:12:43][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[08:12:43][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[08:12:46][D][voice_assistant:519]: Event Type: 99
[08:12:46][D][voice_assistant:665]: TTS stream end
[08:12:46][D][voice_assistant:283]: End of audio stream received
[08:12:46][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[08:12:46][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[08:12:46][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[08:12:46][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[08:12:46][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[08:13:53][D][voice_assistant:315]: Speaker has finished outputting all audio
[08:13:53][D][voice_assistant:412]: State changed from RESPONSE_FINISHED to IDLE
[08:13:53][D][voice_assistant:418]: Desired state set to IDLE
[08:13:53][D][voice_assistant:412]: State changed from IDLE to START_PIPELINE
[08:13:53][D][voice_assistant:418]: Desired state set to START_MICROPHONE
[08:13:53][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:13:53][D][voice_assistant:118]: microphone not running
[08:13:53][D][voice_assistant:200]: Requesting start...
[08:13:53][D][voice_assistant:412]: State changed from START_PIPELINE to STARTING_PIPELINE
[08:13:53][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:13:53][D][voice_assistant:118]: microphone not running
[08:13:53][D][voice_assistant:118]: microphone not running
[08:13:53][D][voice_assistant:118]: microphone not running
[08:13:53][D][voice_assistant:118]: microphone not running
[08:13:53][D][voice_assistant:118]: microphone not running
[08:13:53][D][voice_assistant:433]: Client started, streaming microphone
[08:13:53][D][voice_assistant:412]: State changed from STARTING_PIPELINE to START_MICROPHONE
[08:13:53][D][voice_assistant:418]: Desired state set to STREAMING_MICROPHONE
[08:13:53][D][voice_assistant:153]: Starting Microphone
[08:13:53][D][voice_assistant:412]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[08:13:53][D][esp-idf:000]: I (108844) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4

[08:13:53][D][voice_assistant:519]: Event Type: 1
[08:13:53][D][voice_assistant:522]: Assist Pipeline running
[08:13:53][D][voice_assistant:412]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[08:13:53][D][voice_assistant:519]: Event Type: 9
[08:14:53][D][voice_assistant:519]: Event Type: 0
[08:14:53][D][voice_assistant:519]: Event Type: 2
[08:14:53][D][voice_assistant:609]: Assist Pipeline ended
[08:14:53][D][voice_assistant:412]: State changed from STREAMING_MICROPHONE to IDLE
[08:14:53][D][voice_assistant:418]: Desired state set to IDLE
[08:14:53][D][voice_assistant:412]: State changed from IDLE to START_PIPELINE
[08:14:53][D][voice_assistant:418]: Desired state set to START_MICROPHONE
[08:14:53][D][light:036]: 'led_ring' Setting:
[08:14:53][D][light:085]:   Transition length: 1.0s

Now I can’t get another response. I’m trying the wake word every few seconds. What am I missing?

1 Like

I have the same situation. Bought two Atom Echo to start playing with local voice assistant. The whole thing is very frustrating. Works some times. Stops working randomly. Starts working again. Many hours tweaking and looking for suggestions online.
As far as your esphome settings, they look fine. I have tried many iterations to determine why is stops responding. No luck.

1 Like

Not sure whether this is a clue, but here are the logs this morning after I cycled power on the ESP32:

INFO ESPHome 2023.12.9
INFO Reading configuration /config/esphome/voice-main-bedroom.yaml...
INFO Starting log output from 192.168.0.220 using esphome API
INFO Successfully connected to esphome-voice-main-bedroom @ 192.168.0.220 in 0.095s
INFO Successful handshake with esphome-voice-main-bedroom @ 192.168.0.220 in 0.042s
[08:14:29][I][app:102]: ESPHome version 2023.12.9 compiled on Feb  7 2024, 12:13:35
[08:14:29][C][wifi:573]: WiFi:
[08:14:29][C][wifi:405]:   Local MAC: [redacted]
[08:14:29][C][wifi:410]:   SSID: [redacted]
[08:14:29][C][wifi:411]:   IP Address: 192.168.0.220
[08:14:29][C][wifi:413]:   BSSID: [redacted]
[08:14:29][C][wifi:414]:   Hostname: 'esphome-voice-main-bedroom'
[08:14:29][C][wifi:416]:   Signal strength: -46 dB ▂▄▆█
[08:14:29][C][wifi:420]:   Channel: 1
[08:14:29][C][wifi:421]:   Subnet: 255.255.255.0
[08:14:29][C][wifi:422]:   Gateway: 192.168.0.1
[08:14:29][C][wifi:423]:   DNS1: 0.0.0.0
[08:14:29][C][wifi:424]:   DNS2: 0.0.0.0
[08:14:29][C][logger:439]: Logger:
[08:14:29][C][logger:440]:   Level: DEBUG
[08:14:29][C][logger:441]:   Log Baud Rate: 115200
[08:14:29][C][logger:443]:   Hardware UART: UART0
[08:14:29][C][esp32_rmt_led_strip:173]: ESP32 RMT LED Strip:
[08:14:29][C][esp32_rmt_led_strip:174]:   Pin: 23
[08:14:29][C][esp32_rmt_led_strip:175]:   Channel: 0
[08:14:29][C][esp32_rmt_led_strip:200]:   RGB Order: GRB
[08:14:29][C][esp32_rmt_led_strip:201]:   Max refresh rate: 0
[08:14:29][C][esp32_rmt_led_strip:202]:   Number of LEDs: 24
[08:14:29][C][light:103]: Light 'led_ring'
[08:14:29][C][light:105]:   Default Transition Length: 1.0s
[08:14:29][C][light:106]:   Gamma Correct: 2.80
[08:14:29][C][template.switch:068]: Template Switch 'Use wake word'
[08:14:29][C][template.switch:091]:   Restore Mode: restore defaults to ON
[08:14:29][C][template.switch:057]:   Optimistic: YES
[08:14:29][C][captive_portal:088]: Captive Portal:
[08:14:29][C][mdns:115]: mDNS:
[08:14:29][C][mdns:116]:   Hostname: esphome-voice-main-bedroom
[08:14:29][C][ota:097]: Over-The-Air Updates:
[08:14:29][C][ota:098]:   Address: 192.168.0.220:3232
[08:14:29][C][api:139]: API Server:
[08:14:29][C][api:140]:   Address: 192.168.0.220:6053
[08:14:29][C][api:144]:   Using noise encryption: NO
[08:19:14][I][ota:117]: Boot seems successful, resetting boot loop counter.
[08:19:14][D][esp32.preferences:114]: Saving 1 preferences to flash...
[08:19:14][D][esp32.preferences:143]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed

12 minutes later, the device is sitting there non-responsive to voice commands. The LED ring will light up if I toggle the switch on the device page.

Anyone able to help a fellow enthusiast?