I can't get the wake word to work

LutzDe · October 19, 2025, 2:47pm

So, I’m at my wits’ end. I haven’t been able to get this wake word to work for days.
This is my next test step before I want to experiment with speech-to-phrase.

My first step was to test the microphone. I was able to stream it to my PC via UDP, so the microphone hardware and I2S connection are OK.

The next step was to test the speaker. Streaming internet radio to the “Marvin Media Player” works.

But the wake word won’t work!! (Although the name “Marvin” appears everywhere, everything was tested with the wake word “hey jarvis”.)
What am I doing wrong?

ESPHome code:

esphome:
  name: marvin
  friendly_name: Marvin
  on_boot:
   - priority: -100
     then:
       - wait_until: api.connected
       - if:
           condition:
             switch.is_on: use_wake_word
           then:
             - voice_assistant.start_continuous:

esp32:
  board: esp32dev
  framework:
    type: esp-idf
    version: recommended


# Enable logging
logger:
#  level: DEBUG

# Enable Home Assistant API
api:
  encryption:
    key: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
#    key: !secret XXXXXX
#  reboot_timeout: 0s

ota:
  - platform: esphome    # Your existing OTA method
    password: !secret XXXXXXXXXXXXXXXX

wifi:
  ssid: !secret XXXXXXXXX
  password: !secret XXXXXXXXXXXXX
  fast_connect: true
 
  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: !secret XXXXXXXXXXXXX
    password: !secret XXXXXXXXXXXXXXXXXX

captive_portal:

i2s_audio:
  - id: i2s_out
    i2s_lrclk_pin: GPIO22 #LRC on MAX98357A
    i2s_bclk_pin: GPIO23  #BCL on MAX98357A
  - id: i2s_in
    i2s_lrclk_pin: GPIO27  #WS on microphone INMP441
    i2s_bclk_pin: GPIO26   #SCK on microphone INMP441

microphone:
  - platform: i2s_audio
    i2s_audio_id: i2s_in
    id: marvin_mic
    adc_type: external
    i2s_din_pin: GPIO25
    pdm: false
    channel: left  #L/R Pin INMP441 => GND
    sample_rate: 16000
    bits_per_sample: 16bit


speaker:
  - platform: i2s_audio
    id: marvin_speaker
    dac_type: external
    i2s_audio_id: i2s_out
    i2s_dout_pin: GPIO18
    channel: mono
    sample_rate: 16000  #Defaults to 16000
    bits_per_sample: 16bit #One of 8bit, 16bit, 24bit, or 32bit. Defaults to 16bit. 
 
media_player:
  - platform: speaker
    name: "Marvin Media Player"
    id: marvin_media_player
    buffer_size: 10000 #Must be between 4000 and 4000000. Defaults to 100000
    codec_support_enabled: true # set to false and specify format to save recources 
    announcement_pipeline:
        speaker: marvin_speaker
  #      format: MP3 # One of FLAC, MP3, WAV, or NONE.
        num_channels: 1


voice_assistant:
  id: marvin_va
  microphone: marvin_mic
  use_wake_word: false
  noise_suppression_level: 2
  auto_gain: 31dBFS #Between 0dBFS and 31dBFS inclusive. Defaults to 0 (disabled).
  volume_multiplier: 4.0
  speaker: marvin_speaker

  on_start: 
    then:
      - logger.log: "=>> on_start: Voice assist pipeline is started"    

  on_listening: 
    then:
      - logger.log: "=>> on_listening: Voice assistant is listening..."    

  on_wake_word_detected:
    then: 
      - logger.log: "=>> on_wake_word_detected: Voice assistant has detected the wakeword!!"    

  on_end:
    then: 
      - logger.log: "=>> on_end: Voice assistant has finished all tasks."    

  on_error:
    then: 
      - logger.log: "=>> on_error: Voice assistant error occurred!"    


switch:
  - platform: template
    name: "Use Marvin wake word"
    id: use_wake_word
    optimistic: true
    #restore_mode: RESTORE_DEFAULT_ON
    entity_category: config

    on_turn_on:
      - lambda: id(marvin_va).set_use_wake_word(true);
      - logger.log: "=>> set_use_wake_word(true)"    
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
            - logger.log: "=>> Voice assistant has been switched on."    

    on_turn_off:
      - voice_assistant.stop
      - lambda: id(marvin_va).set_use_wake_word(false);
      - logger.log: "=>> set_use_wake_word(false)"    
      - logger.log: "=>> Voice assistant switched off."

Log file:

16:08:19.708][I][app:185]: ESPHome version 2025.10.1 compiled on Oct 19 2025, 16:07:13
[16:08:19.717][C][wifi:679]: WiFi:
[16:08:19.717][C][wifi:458]:   Local MAC: AA:BB:CC:DD:EE:FF
[16:08:19.717][C][wifi:465]:   IP Address: 192.168.178.52
[16:08:19.724][C][wifi:469]:   SSID: 'XXXXXXX'[redacted]
[16:08:19.724][C][wifi:469]:   BSSID: YY:YY:ZZ:ZZ:XX:XX[redacted]
[16:08:19.724][C][wifi:469]:   Hostname: 'marvin'
[16:08:19.724][C][wifi:469]:   Signal strength: -62 dB ▂▄▆█
[16:08:19.724][C][wifi:469]:   Channel: 6
[16:08:19.724][C][wifi:469]:   Subnet: 255.255.255.0
[16:08:19.724][C][wifi:469]:   Gateway: 192.168.178.1
[16:08:19.724][C][wifi:469]:   DNS1: 192.168.178.1
[16:08:19.724][C][wifi:469]:   DNS2: 0.0.0.0
[16:08:19.728][C][logger:261]: Logger:
[16:08:19.728][C][logger:261]:   Max Level: DEBUG
[16:08:19.728][C][logger:261]:   Initial Level: DEBUG
[16:08:19.738][C][logger:267]:   Log Baud Rate: 115200
[16:08:19.738][C][logger:267]:   Hardware UART: UART0
[16:08:19.742][C][logger:274]:   Task Log Buffer Size: 768
[16:08:19.769][C][template.switch:087]: Template Switch 'Use Marvin wake word'
[16:08:19.769][C][template.switch:087]:   Restore Mode: always OFF
[16:08:19.770][C][template.switch:057]:   Optimistic: YES
[16:08:19.817][C][i2s_audio.microphone:079]: Microphone:
[16:08:19.817][C][i2s_audio.microphone:079]:   Pin: 25
[16:08:19.817][C][i2s_audio.microphone:079]:   PDM: NO
[16:08:19.817][C][i2s_audio.microphone:079]:   DC offset correction: NO
[16:08:19.818][C][psram:016]: PSRAM:
[16:08:19.819][C][psram:019]:   Available: NO
[16:08:19.833][C][i2s_audio.speaker:074]: Speaker:
[16:08:19.833][C][i2s_audio.speaker:074]:   Pin: 18
[16:08:19.833][C][i2s_audio.speaker:074]:   Buffer duration: 500
[16:08:19.836][C][i2s_audio.speaker:080]:   Timeout: 500 ms
[16:08:19.836][C][i2s_audio.speaker:088]:   Communication format: std
[16:08:19.855][C][captive_portal:116]: Captive Portal:
[16:08:19.876][C][esphome.ota:093]: Over-The-Air updates:
[16:08:19.876][C][esphome.ota:093]:   Address: marvin.local:3232
[16:08:19.876][C][esphome.ota:093]:   Version: 2
[16:08:19.884][C][esphome.ota:100]:   Password configured
[16:08:19.885][C][safe_mode:018]: Safe Mode:
[16:08:19.885][C][safe_mode:018]:   Successful after: 60s
[16:08:19.885][C][safe_mode:018]:   Invoke after: 10 attempts
[16:08:19.885][C][safe_mode:018]:   Duration: 300s
[16:08:19.904][C][web_server.ota:241]: Web Server OTA
[16:08:19.908][C][api:222]: Server:
[16:08:19.908][C][api:222]:   Address: marvin.local:6053
[16:08:19.908][C][api:222]:   Listen backlog: 4
[16:08:19.908][C][api:222]:   Max connections: 8
[16:08:19.908][C][api:229]:   Noise encryption: YES
[16:08:19.918][C][mdns:179]: mDNS:
[16:08:19.918][C][mdns:179]:   Hostname: marvin
[16:09:14.241][I][safe_mode:042]: Boot seems successful; resetting boot loop counter
[16:09:14.262][D][esp32.preferences:149]: Writing 2 items: 0 cached, 2 written, 0 failed
[16:10:01.118][D][switch:020]: 'Use Marvin wake word' Turning ON.
[16:10:01.124][D][switch:063]: 'Use Marvin wake word': Sending state ON
[16:10:01.125][D][main:223]: =>> set_use_wake_word(true)
[16:10:01.125][D][voice_assistant:478]: State changed from IDLE to START_MICROPHONE
[16:10:01.125][D][voice_assistant:485]: Desired state set to START_PIPELINE
[16:10:01.125][D][main:233]: =>> Voice assistant has been switched on.
[16:10:01.126][D][voice_assistant:207]: Starting Microphone
[16:10:01.126][D][ring_buffer:034]: Created ring buffer with size 16384
[16:10:01.133][D][voice_assistant:478]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[16:10:01.165][D][voice_assistant:478]: State changed from STARTING_MICROPHONE to START_PIPELINE
[16:10:01.166][D][voice_assistant:228]: Requesting start
[16:10:01.167][D][voice_assistant:478]: State changed from START_PIPELINE to STARTING_PIPELINE
[16:10:01.186][D][voice_assistant:500]: Client started, streaming microphone
[16:10:01.196][D][voice_assistant:478]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[16:10:01.196][D][voice_assistant:485]: Desired state set to STREAMING_MICROPHONE
[16:10:01.241][D][voice_assistant:624]: Event Type: 1
[16:10:01.241][D][voice_assistant:627]: Assist Pipeline running
[16:10:01.242][D][voice_assistant:624]: Event Type: 9
[16:10:01.242][D][main:490]: =>> on_start: Voice assist pipeline is started

Arh · October 19, 2025, 3:55pm

Try changing use wake word to true

LutzDe · October 19, 2025, 5:21pm

I finally got it working. It was due to “bits_per_sample: 32bit” under “microphone:”. It has to be 32-bit for my INMP441 microphone; otherwise, the wake word doesn’t work at all; there’s no response at all. Since the sound on the PC speakers was relatively good with the 16-bit stream, I wouldn’t have expected this result. In any case, it’s not the quality that turns the wake word function on or off here, but rather the configuration value.