Strange ESPHome behavior w/ Voice and ESP32

ESP32 WROOM, ICS43434 i2s microphone, Max98357 i2s 3W Class D Amplifier, a small speaker, and ESPHome. All is hooked up and works “sometimes”… like I’ve had it working once and after a few mins the log went nuts w/ messages and locked up my browser session.

no errors in the logs for Piper, Whisper or OpenWakeWord.

But, I’m getting some odd behavior in the ESPHome logs.

[13:03:36][D][voice_assistant:395]: State changed from IDLE to IDLE
[13:03:36][D][voice_assistant:401]: Desired state set to IDLE
[13:03:59][D][api:102]: Accepted 192.168.1.250
[13:03:59][W][component:214]: Component api took a long time for an operation (0.06 s).
[13:03:59][W][component:215]: Components should block for at most 20-30ms.

I’m not sure what ‘API’ Voice is trying to connect to?
[voice_assistant:441]: No API client connected

Sometimes I get this:

[13:25:00][D][api.connection:1089]: Home Assistant 2023.11.2 (192.168.1.250): Connected successfully
[13:25:11][D][voice_assistant:502]: Event Type: 0
[13:25:11][E][voice_assistant:624]: Error: stt-stream-failed - speech-to-text failed
[13:25:11][D][voice_assistant:502]: Event Type: 2
[13:25:11][D][voice_assistant:589]: Assist Pipeline ended
[13:25:11][D][light:036]: 'Voice Awake Word Light' Setting:
[13:25:11][D][status_led:050]: 'Voice Awake Word Light': Setting state OFF

ESPHome config

esphome:
  name: shop-voice
  friendly_name: shop voice
  on_boot:
    - priority: -100
      then:
        - wait_until: api.connected
        - delay: 1s
        - if:
            condition:
              switch.is_on: use_wake_word
            then:
              - voice_assistant.start_continuous:

esp32:
  board: esp32dev
  framework:
    type: esp-idf
    version: recommended

# Enable logging
logger:

api:
  encryption:
    key: "xx"

ota:
  password: "xx"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Shop-Voice Fallback Hotspot"
    password: "K0rTDePNa3nu"

# sck >> bclk  
# WS >> lrclk  
# SD data pin

i2s_audio:
    i2s_lrclk_pin: GPIO22 # MIC-WS  SPK- LRC
    i2s_bclk_pin: GPIO21  # MIC-SCK SPK- BCLK

microphone:
  - platform: i2s_audio
    id: mic
    adc_type: external
    i2s_din_pin: GPIO17 # SD
    pdm: false

speaker:
  - platform: i2s_audio
    id: big_speaker
    dac_type: external
    i2s_dout_pin: GPIO26 #DIN
    mode: mono

voice_assistant:
  microphone: mic
  use_wake_word: false
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 15.0
  speaker: big_speaker
  id: assist
  on_listening:
    - light.turn_on: led
  on_end: 
    - light.turn_off: led
  on_tts_start:
   - text_sensor.template.publish:
      id: tts_response
      state: !lambda 'return id(x);'
  on_stt_end:
   - text_sensor.template.publish:
       id: stt_result
       state: !lambda 'return id(x);'

light:
    - platform: status_led
      id: led
      name: "Voice Awake Word Light"
      pin: GPIO27
      disabled_by_default: true

switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(assist).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(assist).set_use_wake_word(false);
    
text_sensor:
  - platform: template
    name: STT result
    id: stt_result
  - platform: template
    name: TTS response
    id: tts_response
  - platform: template
    name: x
    id: x

The REAL strangeness is when this stream finally gets spit out afer a few ESP reboots - it comes so fast it locks up the ESPHome browser session. This all came out in the logs within 3 seconds…

Logs shop-voice.yaml
INFO ESPHome 2023.11.1
INFO Reading configuration /config/esphome/shop-voice.yaml...
INFO Starting log output from shop-voice.local using esphome API
INFO Successfully connected to shop-voice in 0.446s
INFO Successful handshake with shop-voice in 0.137s
[13:21:24][D][voice_assistant:124]: microphone not running
[13:21:24][I][app:102]: ESPHome version 2023.11.1 compiled on Nov 17 2023, 13:03:01
[13:21:24][C][status_led:065]: Status Led Light:
[13:21:24][C][status_led:066]:   Pin: GPIO27
[13:21:24][D][voice_assistant:502]: Event Type: 9
[13:21:24][D][voice_assistant:124]: microphone not running
[13:21:24][C][wifi:559]: WiFi:
[13:21:24][C][wifi:391]:   Local MAC: C8:F0:9E:F3:4D:C4
[13:21:24][C][wifi:396]:   SSID: [redacted]
[13:21:24][C][wifi:397]:   IP Address: 192.168.1.189
[13:21:24][C][wifi:399]:   BSSID: [redacted]
[13:21:24][C][wifi:400]:   Hostname: 'shop-voice'
[13:21:24][C][wifi:402]:   Signal strength: -52 dB ▂▄▆█
[13:21:24][C][wifi:406]:   Channel: 11
[13:21:24][C][wifi:407]:   Subnet: 255.255.255.0
[13:21:24][C][wifi:408]:   Gateway: 192.168.1.1
[13:21:24][C][wifi:409]:   DNS1: 192.168.1.1
[13:21:24][C][wifi:410]:   DNS2: 0.0.0.0
[13:21:24][D][voice_assistant:502]: Event Type: 0
[13:21:24][E][voice_assistant:624]: Error: no_wake_word - No wake word detected
[13:21:24][D][voice_assistant:495]: Signaling stop...
[13:21:24][D][voice_assistant:395]: State changed from STARTING_PIPELINE to STOP_MICROPHONE
[13:21:24][D][voice_assistant:401]: Desired state set to IDLE
[13:21:24][D][voice_assistant:395]: State changed from STOP_MICROPHONE to IDLE
[13:21:24][C][logger:416]: Logger:
[13:21:24][C][logger:417]:   Level: DEBUG
[13:21:24][C][logger:418]:   Log Baud Rate: 115200
[13:21:24][C][logger:420]:   Hardware UART: UART0
[13:21:24][D][voice_assistant:502]: Event Type: 2
[13:21:24][D][voice_assistant:589]: Assist Pipeline ended
[13:21:24][D][light:036]: 'Voice Awake Word Light' Setting:
[13:21:24][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE
[13:21:24][D][voice_assistant:401]: Desired state set to START_MICROPHONE
[13:21:24][D][status_led:050]: 'Voice Awake Word Light': Setting state OFF
[13:21:24][D][voice_assistant:495]: Signaling stop...
[13:21:24][D][voice_assistant:124]: microphone not running
[13:21:24][D][voice_assistant:206]: Requesting start...
[13:21:24][D][voice_assistant:395]: State changed from START_PIPELINE to STARTING_PIPELINE
[13:21:24][D][voice_assistant:502]: Event Type: 1
[13:21:24][D][voice_assistant:505]: Assist Pipeline running
[13:21:24][D][voice_assistant:124]: microphone not running
[13:21:24][C][template.text_sensor:020]: Template Sensor 'STT result'
[13:21:24][D][voice_assistant:502]: Event Type: 9
[13:21:24][D][voice_assistant:124]: microphone not running
[13:21:24][C][template.text_sensor:020]: Template Sensor 'TTS response'
[13:21:24][D][voice_assistant:502]: Event Type: 0
[13:21:24][E][voice_assistant:624]: Error: no_wake_word - No wake word detected
[13:21:24][D][voice_assistant:495]: Signaling stop...
[13:21:24][D][voice_assistant:395]: State changed from STARTING_PIPELINE to STOP_MICROPHONE
[13:21:24][D][voice_assistant:401]: Desired state set to IDLE
[13:21:24][D][voice_assistant:395]: State changed from STOP_MICROPHONE to IDLE
[13:21:24][C][template.text_sensor:020]: Template Sensor 'x'
[13:21:24][D][voice_assistant:502]: Event Type: 2
[13:21:24][D][voice_assistant:589]: Assist Pipeline ended
[13:21:24][D][light:036]: 'Voice Awake Word Light' Setting:
[13:21:24][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE
[13:21:24][D][voice_assistant:401]: Desired state set to START_MICROPHONE
[13:21:24][C][light:103]: Light 'Voice Awake Word Light'
[13:21:24][D][status_led:050]: 'Voice Awake Word Light': Setting state OFF
[13:21:24][D][voice_assistant:495]: Signaling stop...
[13:21:24][D][voice_assistant:124]: microphone not running
[13:21:24][D][voice_assistant:206]: Requesting start...
[13:21:24][D][voice_assistant:395]: State changed from START_PIPELINE to STARTING_PIPELINE
[13:21:24][C][template.switch:068]: Template Switch 'Use wake word'
[13:21:24][C][template.switch:091]:   Restore Mode: restore defaults to ON
[13:21:24][C][template.switch:057]:   Optimistic: YES
[13:21:24][D][voice_assistant:502]: Event Type: 1
[13:21:24][D][voice_assistant:505]: Assist Pipeline running
[13:21:24][D][voice_assistant:124]: microphone not running
[13:21:24][D][voice_assistant:502]: Event Type: 9
[13:21:24][D][voice_assistant:124]: microphone not running
[13:21:24][D][voice_assistant:502]: Event Type: 0
[13:21:24][E][voice_assistant:624]: Error: no_wake_word - No wake word detected
[13:21:24][D][voice_assistant:495]: Signaling stop...
[13:21:24][D][voice_assistant:395]: State changed from STARTING_PIPELINE to STOP_MICROPHONE
[13:21:24][D][voice_assistant:401]: Desired state set to IDLE
[13:21:24][D][voice_assistant:395]: State changed from STOP_MICROPHONE to IDLE
[13:21:24][D][voice_assistant:502]: Event Type: 2
[13:21:24][D][voice_assistant:589]: Assist Pipeline ended
[13:21:24][D][light:036]: 'Voice Awake Word Light' Setting:
[13:21:24][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE
[13:21:24][D][voice_assistant:401]: Desired state set to START_MICROPHONE
[13:21:24][C][mdns:115]: mDNS:
[13:21:24][C][ota:098]:   Address: shop-voice.local:3232
[13:21:24][C][ota:101]:   Using Password.
[13:21:24][D][voice_assistant:502]: Event Type: 1
[13:21:24][D][voice_assistant:505]: Assist Pipeline running
[13:21:24][D][voice_assistant:124]: microphone not running
[13:21:24][C][api:139]: API Server:
[13:21:24][C][api:140]:   Address: shop-voice.local:6053
[13:21:24][C][api:142]:   Using noise encryption: YES
[13:21:24][D][voice_assistant:502]: Event Type: 9
[13:21:24][D][voice_assistant:124]: microphone not running
[13:21:24][D][voice_assistant:502]: Event Type: 0
[13:21:24][E][voice_assistant:624]: Error: no_wake_word - No wake word detected
[13:21:24][D][voice_assistant:495]: Signaling stop...
[13:21:24][D][voice_assistant:395]: State changed from STARTING_PIPELINE to STOP_MICROPHONE
[13:21:24][D][voice_assistant:401]: Desired state set to IDLE
[13:21:24][D][voice_assistant:395]: State changed from STOP_MICROPHONE to IDLE
[13:21:24][D][voice_assistant:502]: Event Type: 2
[13:21:24][D][voice_assistant:589]: Assist Pipeline ended
[13:21:24][D][light:036]: 'Voice Awake Word Light' Setting:
[13:21:24][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE
[13:21:24][D][voice_assistant:401]: Desired state set to START_MICROPHONE
[13:21:24][D][status_led:050]: 'Voice Awake Word Light': Setting state OFF
[13:21:24][D][voice_assistant:495]: Signaling stop...
[13:21:24][D][voice_assistant:124]: microphone not running
[13:21:24][D][voice_assistant:206]: Requesting start...
[13:21:24][D][voice_assistant:395]: State changed from START_PIPELINE to STARTING_PIPELINE
[13:21:24][D][voice_assistant:502]: Event Type: 1
[13:21:24][D][voice_assistant:505]: Assist Pipeline running
[13:21:24][D][voice_assistant:124]: microphone not running
[13:21:24][D][voice_assistant:502]: Event Type: 9
[13:21:24][D][voice_assistant:124]: microphone not running
[13:21:24][D][voice_assistant:502]: Event Type: 0
[13:21:24][E][voice_assistant:624]: Error: no_wake_word - No wake word detected
[13:21:24][D][voice_assistant:495]: Signaling stop...
[13:21:24][D][voice_assistant:395]: State changed from STARTING_PIPELINE to STOP_MICROPHONE
[13:21:25][D][voice_assistant:401]: Desired state set to IDLE
[13:21:25][D][voice_assistant:395]: State changed from STOP_MICROPHONE to IDLE
[13:21:25][D][voice_assistant:502]: Event Type: 2
[13:21:25][D][voice_assistant:589]: Assist Pipeline ended
[13:21:25][D][light:036]: 'Voice Awake Word Light' Setting:
[13:21:25][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE
[13:21:25][D][voice_assistant:401]: Desired state set to START_MICROPHONE
[13:21:25][D][status_led:050]: 'Voice Awake Word Light': Setting state OFF
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:206]: Requesting start...
[13:21:25][D][voice_assistant:395]: State changed from START_PIPELINE to STARTING_PIPELINE
[13:21:25][D][voice_assistant:502]: Event Type: 1
[13:21:25][D][voice_assistant:505]: Assist Pipeline running
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:502]: Event Type: 9
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:502]: Event Type: 0
[13:21:25][E][voice_assistant:624]: Error: no_wake_word - No wake word detected
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:25][D][voice_assistant:395]: State changed from STARTING_PIPELINE to STOP_MICROPHONE
[13:21:25][D][voice_assistant:401]: Desired state set to IDLE
[13:21:25][D][voice_assistant:395]: State changed from STOP_MICROPHONE to IDLE
[13:21:25][D][voice_assistant:502]: Event Type: 2
[13:21:25][D][voice_assistant:589]: Assist Pipeline ended
[13:21:25][D][light:036]: 'Voice Awake Word Light' Setting:
[13:21:25][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE
[13:21:25][D][voice_assistant:401]: Desired state set to START_MICROPHONE
[13:21:25][D][status_led:050]: 'Voice Awake Word Light': Setting state OFF
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:206]: Requesting start...
[13:21:25][D][voice_assistant:395]: State changed from START_PIPELINE to STARTING_PIPELINE
[13:21:25][D][voice_assistant:502]: Event Type: 1
[13:21:25][D][voice_assistant:505]: Assist Pipeline running
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:502]: Event Type: 9
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:502]: Event Type: 0
[13:21:25][E][voice_assistant:624]: Error: no_wake_word - No wake word detected
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:25][D][voice_assistant:395]: State changed from STARTING_PIPELINE to STOP_MICROPHONE
[13:21:25][D][voice_assistant:401]: Desired state set to IDLE
[13:21:25][D][voice_assistant:395]: State changed from STOP_MICROPHONE to IDLE
[13:21:25][D][voice_assistant:502]: Event Type: 2
[13:21:25][D][voice_assistant:589]: Assist Pipeline ended
[13:21:25][D][light:036]: 'Voice Awake Word Light' Setting:
[13:21:25][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE
[13:21:25][D][voice_assistant:401]: Desired state set to START_MICROPHONE
[13:21:25][D][status_led:050]: 'Voice Awake Word Light': Setting state OFF
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:206]: Requesting start...
[13:21:25][D][voice_assistant:395]: State changed from START_PIPELINE to STARTING_PIPELINE
[13:21:25][D][voice_assistant:502]: Event Type: 1
[13:21:25][D][voice_assistant:505]: Assist Pipeline running
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:502]: Event Type: 9
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:502]: Event Type: 0
[13:21:25][E][voice_assistant:624]: Error: no_wake_word - No wake word detected
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:25][D][voice_assistant:395]: State changed from STARTING_PIPELINE to STOP_MICROPHONE
[13:21:25][D][voice_assistant:401]: Desired state set to IDLE
[13:21:25][D][voice_assistant:395]: State changed from STOP_MICROPHONE to IDLE
[13:21:25][D][voice_assistant:502]: Event Type: 2
[13:21:25][D][voice_assistant:589]: Assist Pipeline ended
[13:21:25][D][light:036]: 'Voice Awake Word Light' Setting:
[13:21:25][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE
[13:21:25][D][voice_assistant:401]: Desired state set to START_MICROPHONE
[13:21:25][D][status_led:050]: 'Voice Awake Word Light': Setting state OFF
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:206]: Requesting start...
[13:21:25][D][voice_assistant:395]: State changed from START_PIPELINE to STARTING_PIPELINE
[13:21:25][D][voice_assistant:502]: Event Type: 1
[13:21:25][D][voice_assistant:505]: Assist Pipeline running
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:502]: Event Type: 9
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:502]: Event Type: 0
[13:21:25][E][voice_assistant:624]: Error: no_wake_word - No wake word detected
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:25][D][voice_assistant:395]: State changed from STARTING_PIPELINE to STOP_MICROPHONE
[13:21:25][D][voice_assistant:401]: Desired state set to IDLE
[13:21:25][D][voice_assistant:395]: State changed from STOP_MICROPHONE to IDLE
[13:21:25][D][voice_assistant:502]: Event Type: 2
[13:21:25][D][voice_assistant:589]: Assist Pipeline ended
[13:21:25][D][light:036]: 'Voice Awake Word Light' Setting:
[13:21:25][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE
[13:21:25][D][voice_assistant:401]: Desired state set to START_MICROPHONE
[13:21:25][D][status_led:050]: 'Voice Awake Word Light': Setting state OFF
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:206]: Requesting start...
[13:21:25][D][voice_assistant:395]: State changed from START_PIPELINE to STARTING_PIPELINE
[13:21:25][D][voice_assistant:502]: Event Type: 1
[13:21:25][D][voice_assistant:505]: Assist Pipeline running
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:502]: Event Type: 9
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:502]: Event Type: 0
[13:21:25][E][voice_assistant:624]: Error: no_wake_word - No wake word detected
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:25][D][voice_assistant:395]: State changed from STARTING_PIPELINE to STOP_MICROPHONE
[13:21:25][D][voice_assistant:401]: Desired state set to IDLE
[13:21:25][D][voice_assistant:395]: State changed from STOP_MICROPHONE to IDLE
[13:21:25][D][voice_assistant:502]: Event Type: 2
[13:21:25][D][voice_assistant:589]: Assist Pipeline ended
[13:21:25][D][light:036]: 'Voice Awake Word Light' Setting:
[13:21:25][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE
[13:21:25][D][voice_assistant:401]: Desired state set to START_MICROPHONE
[13:21:25][D][status_led:050]: 'Voice Awake Word Light': Setting state OFF
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:206]: Requesting start...
[13:21:25][D][voice_assistant:395]: State changed from START_PIPELINE to STARTING_PIPELINE
[13:21:25][D][voice_assistant:502]: Event Type: 1
[13:21:25][D][voice_assistant:505]: Assist Pipeline running
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:502]: Event Type: 9
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:502]: Event Type: 0
[13:21:25][E][voice_assistant:624]: Error: no_wake_word - No wake word detected
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:25][D][voice_assistant:395]: State changed from STARTING_PIPELINE to STOP_MICROPHONE
[13:21:25][D][voice_assistant:401]: Desired state set to IDLE
[13:21:25][D][voice_assistant:395]: State changed from STOP_MICROPHONE to IDLE
[13:21:25][D][voice_assistant:502]: Event Type: 2
[13:21:25][D][voice_assistant:589]: Assist Pipeline ended
[13:21:25][D][light:036]: 'Voice Awake Word Light' Setting:
[13:21:25][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE
[13:21:25][D][voice_assistant:401]: Desired state set to START_MICROPHONE
[13:21:25][D][status_led:050]: 'Voice Awake Word Light': Setting state OFF
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:206]: Requesting start...
[13:21:25][D][voice_assistant:395]: State changed from START_PIPELINE to STARTING_PIPELINE
[13:21:25][D][voice_assistant:502]: Event Type: 1
[13:21:25][D][voice_assistant:505]: Assist Pipeline running
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:502]: Event Type: 9
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:502]: Event Type: 0
[13:21:25][E][voice_assistant:624]: Error: no_wake_word - No wake word detected
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:25][D][voice_assistant:395]: State changed from STARTING_PIPELINE to STOP_MICROPHONE
[13:21:25][D][voice_assistant:401]: Desired state set to IDLE
[13:21:25][D][voice_assistant:395]: State changed from STOP_MICROPHONE to IDLE
[13:21:25][D][voice_assistant:502]: Event Type: 2
[13:21:25][D][voice_assistant:589]: Assist Pipeline ended
[13:21:25][D][light:036]: 'Voice Awake Word Light' Setting:
[13:21:25][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE
[13:21:25][D][voice_assistant:401]: Desired state set to START_MICROPHONE
[13:21:25][D][status_led:050]: 'Voice Awake Word Light': Setting state OFF
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:206]: Requesting start...
[13:21:25][D][voice_assistant:395]: State changed from START_PIPELINE to STARTING_PIPELINE
[13:21:25][D][voice_assistant:502]: Event Type: 1
[13:21:25][D][voice_assistant:505]: Assist Pipeline running
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:502]: Event Type: 9
[13:21:25][D][voice_assistant:124]: microphone not running
[13:21:25][D][voice_assistant:502]: Event Type: 0
[13:21:25][E][voice_assistant:624]: Error: no_wake_word - No wake word detected
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:25][D][voice_assistant:395]: State changed from STARTING_PIPELINE to STOP_MICROPHONE
[13:21:25][D][voice_assistant:401]: Desired state set to IDLE
[13:21:25][D][voice_assistant:395]: State changed from STOP_MICROPHONE to IDLE
[13:21:25][D][voice_assistant:502]: Event Type: 2
[13:21:25][D][voice_assistant:589]: Assist Pipeline ended
[13:21:25][D][light:036]: 'Voice Awake Word Light' Setting:
[13:21:25][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE
[13:21:25][D][voice_assistant:401]: Desired state set to START_MICROPHONE
[13:21:25][D][status_led:050]: 'Voice Awake Word Light': Setting state OFF
[13:21:25][D][voice_assistant:495]: Signaling stop...
[13:21:26][D][voice_assistant:124]: microphone not running
[13:21:26][D][voice_assistant:206]: Requesting start...
[13:21:26][D][voice_assistant:395]: State changed from START_PIPELINE to STARTING_PIPELINE
[13:21:26][D][voice_assistant:502]: Event Type: 1
[13:21:26][D][voice_assistant:505]: Assist Pipeline running
[13:21:26][D][voice_assistant:124]: microphone not running
[13:21:26][D][voice_assistant:502]: Event Type: 9
[13:21:26][D][voice_assistant:124]: microphone not running
[13:21:26][D][voice_assistant:502]: Event Type: 0
[13:21:26][E][voice_assistant:624]: Error: no_wake_word - No wake word detected
[13:21:26][D][voice_assistant:495]: Signaling stop...

So I just rebooted the ESP again (no changes), and its working and I got this after saying “Hey Jarvis, turn on toolbox light”:

[13:39:09][D][voice_assistant:502]: Event Type: 10
[13:39:09][D][voice_assistant:511]: Wake word detected
[13:39:09][D][voice_assistant:502]: Event Type: 3
[13:39:09][D][voice_assistant:516]: STT started
[13:39:09][D][light:036]: 'Voice Awake Word Light' Setting:
[13:39:09][D][light:047]:   State: ON
[13:39:09][D][status_led:050]: 'Voice Awake Word Light': Setting state ON
[13:39:10][D][voice_assistant:502]: Event Type: 11
[13:39:10][D][voice_assistant:643]: Starting STT by VAD
[13:39:12][D][voice_assistant:502]: Event Type: 12
[13:39:12][D][voice_assistant:647]: STT by VAD end
[13:39:28][D][voice_assistant:502]: Event Type: 4
[13:39:28][D][voice_assistant:395]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[13:39:28][D][voice_assistant:401]: Desired state set to AWAITING_RESPONSE
[13:39:28][D][voice_assistant:531]: Speech recognised as: " Turn toolbox on."
[13:39:28][D][text_sensor:064]: 'STT result': Sending state ' Turn toolbox on.'
[13:39:28][D][voice_assistant:395]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[13:39:28][D][esp-idf:000]: I (236957) I2S: DMA queue destroyed

[13:39:28][D][voice_assistant:502]: Event Type: 5
[13:39:28][D][voice_assistant:536]: Intent started
[13:39:28][D][voice_assistant:395]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[13:39:28][D][voice_assistant:502]: Event Type: 6
[13:39:28][D][voice_assistant:502]: Event Type: 7
[13:39:28][D][voice_assistant:559]: Response: "Turned on switch"
[13:39:28][D][text_sensor:064]: 'TTS response': Sending state 'Turned on switch'
[13:39:28][D][voice_assistant:502]: Event Type: 8
[13:39:28][D][voice_assistant:577]: Response URL: "https://rc2o77cp1gdll61vz9yjzvjd63kt5fab.ui.nabu.casa/api/tts_proxy/093290dcda7b2989879ced0ee701d7978ed0c34a_en-us_fdd41fd0ec_tts.piper.raw"
[13:39:28][D][voice_assistant:395]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[13:39:28][D][voice_assistant:401]: Desired state set to STREAMING_RESPONSE
[13:39:28][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[13:39:28][D][voice_assistant:502]: Event Type: 2
[13:39:28][D][voice_assistant:589]: Assist Pipeline ended
[13:39:28][D][light:036]: 'Voice Awake Word Light' Setting:
[13:39:28][D][light:047]:   State: OFF
[13:39:28][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[13:39:28][D][status_led:050]: 'Voice Awake Word Light': Setting state OFF
[13:39:28][D][voice_assistant:502]: Event Type: 98
[13:39:29][D][voice_assistant:502]: Event Type: 99
[13:39:29][D][voice_assistant:395]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[13:39:29][D][voice_assistant:401]: Desired state set to IDLE
[13:39:29][D][voice_assistant:395]: State changed from RESPONSE_FINISHED to IDLE
[13:39:29][D][voice_assistant:401]: Desired state set to IDLE
[13:39:29][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE
[13:39:29][D][voice_assistant:401]: Desired state set to START_MICROPHONE
[13:39:29][D][voice_assistant:124]: microphone not running
[13:39:29][D][voice_assistant:206]: Requesting start...
[13:39:29][D][voice_assistant:395]: State changed from START_PIPELINE to STARTING_PIPELINE
[13:39:29][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[13:39:29][D][voice_assistant:416]: Client started, streaming microphone
[13:39:29][D][voice_assistant:395]: State changed from STARTING_PIPELINE to START_MICROPHONE
[13:39:29][D][voice_assistant:401]: Desired state set to STREAMING_MICROPHONE
[13:39:29][D][voice_assistant:159]: Starting Microphone
[13:39:29][D][voice_assistant:395]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[13:39:29][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[13:39:29][D][esp-idf:000]: I (238584) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4

[13:39:29][D][voice_assistant:502]: Event Type: 1
[13:39:29][D][voice_assistant:505]: Assist Pipeline running
[13:39:29][D][voice_assistant:395]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[13:39:29][D][voice_assistant:502]: Event Type: 9
[13:39:44][D][voice_assistant:502]: Event Type: 0
[13:39:44][D][voice_assistant:502]: Event Type: 2
[13:39:44][D][voice_assistant:589]: Assist Pipeline ended
[13:39:44][D][voice_assistant:395]: State changed from STREAMING_MICROPHONE to IDLE
[13:39:44][D][voice_assistant:401]: Desired state set to IDLE
[13:39:44][D][light:036]: 'Voice Awake Word Light' Setting:
[13:39:44][D][voice_assistant:395]: State changed from IDLE to START_PIPELINE
[13:39:44][D][voice_assistant:401]: Desired state set to START_MICROPHONE
[13:39:44][D][status_led:050]: 'Voice Awake Word Light': Setting state OFF
[13:39:44][D][voice_assistant:206]: Requesting start...
[13:39:44][D][voice_assistant:395]: State changed from START_PIPELINE to STARTING_PIPELINE
[13:39:44][D][voice_assistant:416]: Client started, streaming microphone
[13:39:44][D][voice_assistant:395]: State changed from STARTING_PIPELINE to 

So, after 8-10 ESP reboots, it worked and behaved… its painfully slow, but the light went on. From this
[13:43:59][D][voice_assistant:511]: Wake word detected
to this
[13:44:18][D][voice_assistant:531]: Speech recognised as: " turn toolbox on."

is 19 seconds…

There are so many moving parts to this I’m not even sure where to begin!!!

Jeff

I have found that most intermittent issues were caused by power supply when used with MAX98357 board. The MAX98357 can easily use 1A at 5V. Once I switched to 5V 2.5A most of the problems went away.

This includes:

  • Very slow response times
  • Client connection errors
  • Noise from speaker before first wake word
  • Partial/cutoff responses

I notice you also have an LED. This caused problems for me when I had a resistor under 470 Ohm as it was drawing too much power from GPIO and causing instability that caused client disconnects and chopped off responses.

Also replaced cheap breadboard jumper wires to higher quality on +5v/+3.3v and GND with soldered connections.

PS is 5A, so that’s not the problem. The LED… I drive these same LED’s in dozens of other ESP units I have using the same LED configuration, all work well and I use a 680 ohm resistor on it. Even so, its not even on at boot up, it only goes on if & when a wake word is detected. I also removed the MAX9857 from the picture (output responses to my media player) with really no change.

None of this addresses the incredibly slow inference times. Nor the difficulty of getting a reliable & accurate remote mic / speaker. After researching this topic deeply (Im a formally trained EE and was an audio component design engineer) I now understand how futile an effort this was for practical use. Its a REALLY hard problem to solve, and it won’t be solved with any of this hardware. Yes, it’s cool and it sort of works, and it will teach you a LOT about how the HA team is trying to tie all this stuff together. But for practical use today, forget about it.

I’m now using a $49 Espressif Box3 + Open Source Willow Inference Server (http://heywillio.io) and I get sub-500ms STT response times and rock solid inference and accuracy of converting speech to text approaching 100%, even from across the room. That is beyond impressive on so many levels…

So, is good to see a comparison of the different approaches, especially as we start seeing improvements to the conversation agent. For getting remote STT into HA, my bet is on Willow and the Box3, at least for now. For $49, its something literally anyone can get working in 15 minutes after opening the box (a real 15 min including installing Willow) and it is frustration free. To think a RPi4 (or any non-GPU based engine) will be a decent & practical local inference server is wishful thinking.

Once the STT gets to into the HA conversation engine, it’s only then that the severe limitations of the conversation agent in HA really rear up. Its easy to turn a light on and off, or turn on a climate control, but much beyond that is going to take many iterations of the conversation agent & HA Assistant before we can all consider unplugging our Alexa or other units. For me, it’s saying simple intents like:

“ESP, set a timer called eggs for 10 minutes”
“ESP, play AC/DC on Apple Music (or Pandora)”
“ESP who is this artist?”
“ESP, what iss the weather forecast?”

Once we can EASILY get those kinds of every day intents working and working well, adoption will be swift. Until then… the complexity of implementing that with the state of where we are right now is a show stopper for 99% of folks who will try it.

Do NOT read this as bashing the monumental efforts of the HA Voice team. The progress they have made so far is quite impressive. I just see a lot of chatter here and especially on Discord and folks have the wrong impressions and expectations of what the state of the art wrt HA Voice really is.

So i’m now getting this with 3x esp32 voice assistants connected

[15:43:43][W][component:215]: Components should block for at most 20-30ms.
[15:43:43][D][api.connection:1089]: Home Assistant 2023.12.3 (192.168.1.162): Connected successfully
[15:43:43][E][voice_assistant:375]: Multiple API Clients attempting to connect to Voice Assistant
[15:43:43][E][voice_assistant:376]: Current client: Home Assistant 2023.12.3 (192.168.1.162)
[15:43:43][E][voice_assistant:377]: New client: Home Assistant 2023.12.3 (192.168.1.162)
[15:44:12][W][api.connection:102]: Home Assistant 2023.12.3 (192.168.1.162): Connection reset
[15:44:12][D][voice_assistant:422]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[15:44:12][D][voice_assistant:428]: Desired state set to IDLE
[15:44:12][D][voice_assistant:422]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[15:44:12][D][voice_assistant:422]: State changed from STOPPING_MICROPHONE to IDLE

after that the esp32 voice assistant stops responding

I’m getting similar issues as @jazzmonger with my ESP32 WROOM with INMP441 mic and no speaker. Initial boot up is a fast loop of starting and stopping the mic stream (seems like the i2s driver doesn’t find the mic on the first few boots.) Then it settles down into a 5 second loop of listening then restarting the pipeline. Unclear if this is normal/expected behavior.

I will say that running HA in a Proxmox VM on a i5-6500T with one core, I’m getting sub 500ms STT response times. Very useable. But agree that the basics for replacing Alexa aren’t there yet.
“five minute egg timer”
“turn on the water heater for 15 minutes”
“remind me to take my pill at 8pm”

When those things get added and are plug and play, I’ll be able to move my family over.

1 Like