Voice Esp32-s3 N16R8 - Max98357 - Inmp441

Good evening,
I have the configuration mentioned in the subject.
The microphone works, the actions are executed but it is impossible to get a confirmation on the speakerphone.
Would anyone have a yaml and a cladding that works for this configuration.
SD and Gain of the Max98357A are connected to the GND.

Here’s my ESPHome yaml.

esphome:
  name: esp32-psram16-r8-voice-3
  friendly_name: Esp32 Psram16-r8 Voice 3
  on_boot:
    priority: -10
    then:
      - light.turn_on:
          id: status_led
          blue: 100%
          brightness: 40%
      - delay: 1s
      - micro_wake_word.start
      - delay: 3s
      - voice_assistant.start

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf

psram:
  mode: octal
  speed: 80MHz

#logger:
#  level: INFO
logger:
  level: VERY_VERBOSE

# Enable Home Assistant API
api:
  encryption:
    key: "+ez3sbE7GwM6JTDj/orcPFMtCzcK3H2dW3z2Iq3JVQ2="
  on_client_connected:
        then:
          - delay: 50ms
          - light.turn_off: status_led
          - micro_wake_word.start:
  on_client_disconnected:
        then:
          - voice_assistant.stop: 



ota:
  - platform: esphome
    password: "36c193329a9d3621cc4a0704724316a7"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  manual_ip:
    static_ip: 192.168.1.15
    gateway: 192.168.1.254
    subnet: 255.255.255.0 

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Esp32-Psram16-R8-Voice-3"
    password: "IHAMRbks4Y7s"

captive_portal:

button:
  - platform: restart
    id: reboot
    name: "Reboot V3"

# =====================
# LED RGB (statuts)
# =====================
light:
  - platform: esp32_rmt_led_strip
    id: status_led
    name: "Voice Assistant LED"
    pin: GPIO48
    num_leds: 1
    chipset: ws2812
    rgb_order: GRB
    effects:
      - pulse:
      - pulse:
          name: fast_pulse
          transition_length: 0.4s
          update_interval: 0.4s

# =====================
# I2S BUS
# =====================
i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO3
    i2s_bclk_pin: GPIO2

  - id: i2s_out
    i2s_lrclk_pin: GPIO6
    i2s_bclk_pin: GPIO7
    #i2s_lrclk_pin: GPIO15
    #i2s_bclk_pin: GPIO16
# =====================
# MICROPHONE
# =====================
microphone:
  - platform: i2s_audio
    id: mic
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO4
    adc_type: external
    bits_per_sample: 32bit
    channel: left   # INMP441 mono

# =====================
# SPEAKER
# =====================
speaker:
  - platform: i2s_audio
    id: speaker3
    i2s_audio_id: i2s_out
    i2s_dout_pin: GPIO8 #GPIO17
    dac_type: external
    bits_per_sample: 16bit
    sample_rate: 16000
    use_apll: true          # 🔴 CRITIQUE POUR MAX98357A + ESP32-S3
    buffer_duration: 300ms

# =====================
# WAKE WORD
# =====================
micro_wake_word:
  models:
    - model: okay_nabu

  on_wake_word_detected:
    - light.turn_on:
        id: status_led
        green: 100%
        brightness: 60%
        effect: fast_pulse
    - voice_assistant.start:
        wake_word: !lambda return wake_word;
        silence_detection: true

# =====================
# VOICE ASSISTANT
# =====================
voice_assistant:
  id: va
  microphone: mic
  speaker: speaker3
  auto_gain: 31dBFS
  noise_suppression_level: 2
  volume_multiplier: 8.0

  on_listening:
    - light.turn_on:
        id: status_led
        green: 100%
        brightness: 60%

  on_stt_end:
    - light.turn_on:
        id: status_led
        blue: 100%
        brightness: 60%

  on_tts_start:
    - light.turn_on:
        id: status_led
        blue: 100%
        brightness: 60%

  on_end:
    - light.turn_on:
        id: status_led
        blue: 100%
        brightness: 30%
    - delay: 500ms
    - micro_wake_word.start

EDIT - LOG:

[19:59:16.743][C][micro_wake_word:066]:   models:
[19:59:16.748][C][micro_wake_word:014]:     - Wake Word: Okay Nabu
[19:59:16.748][C][micro_wake_word:014]:       Probability cutoff: 0.97
[19:59:16.748][C][micro_wake_word:014]:       Sliding window size: 5
[19:59:16.985][E][voice_assistant:542]: No API client connected
[19:59:16.985][D][voice_assistant:478]: State changed from IDLE to IDLE
[19:59:16.985][D][voice_assistant:485]: Desired state set to IDLE
[19:59:17.198][D][api:161]: Accept 192.168.1.32
[19:59:17.251][D][api.connection:1386]: Home Assistant 2025.12.3 (192.168.1.32) connected
[19:59:17.310][D][light:091]: 'Voice Assistant LED' Setting:
[19:59:17.310][D][light:142]:   Transition length: 1.0s
[19:59:17.316][W][micro_wake_word:354]: Wake word detection is already running
[19:59:47.849][D][micro_wake_word:323]: Detected 'Okay Nabu' with sliding average probability is 0.98 and max probability is 1.00
[19:59:47.849][D][light:091]: 'Voice Assistant LED' Setting:
[19:59:47.852][D][light:104]:   State: ON
[19:59:47.855][D][light:079]:   Brightness: 60%
[19:59:47.860][D][light:115]:   Red: 100%, Green: 100%, Blue: 100%
[19:59:47.860][D][light:165]:   Effect: 'fast_pulse'
[19:59:47.864][D][voice_assistant:478]: State changed from IDLE to START_MICROPHONE
[19:59:47.867][D][voice_assistant:485]: Desired state set to START_PIPELINE
[19:59:47.869][D][micro_wake_word:368]: Stopping wake word detection
[19:59:47.873][D][voice_assistant:207]: Starting Microphone
[19:59:47.876][D][ring_buffer:034]: Created ring buffer with size 16384
[19:59:47.881][D][voice_assistant:478]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[19:59:47.886][D][micro_wake_word:376]: State changed from DETECTING_WAKE_WORD to STOPPING
[19:59:47.893][D][voice_assistant:478]: State changed from STARTING_MICROPHONE to START_PIPELINE
[19:59:47.908][D][voice_assistant:228]: Requesting start
[19:59:47.911][D][voice_assistant:478]: State changed from START_PIPELINE to STARTING_PIPELINE
[19:59:47.915][D][micro_wake_word:271]: Inference task is stopping, deallocating buffers
[19:59:47.918][D][micro_wake_word:276]: Inference task is finished, freeing task resources
[19:59:47.921][D][micro_wake_word:376]: State changed from STOPPING to STOPPED
[19:59:47.927][D][voice_assistant:500]: Client started, streaming microphone
[19:59:47.932][D][voice_assistant:478]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[19:59:47.932][D][voice_assistant:485]: Desired state set to STREAMING_MICROPHONE
[19:59:47.935][D][voice_assistant:624]: Event Type: 1
[19:59:47.938][D][voice_assistant:627]: Assist Pipeline running
[19:59:47.941][D][voice_assistant:624]: Event Type: 3
[19:59:47.944][D][voice_assistant:646]: STT started
[19:59:47.948][D][light:091]: 'Voice Assistant LED' Setting:
[19:59:47.951][D][light:079]:   Brightness: 60%
[19:59:47.956][D][light:115]:   Red: 100%, Green: 100%, Blue: 100%
[19:59:47.958][D][light:142]:   Transition length: 1.0s
[19:59:50.156][D][voice_assistant:624]: Event Type: 11
[19:59:50.157][D][voice_assistant:827]: Starting STT by VAD
[19:59:52.218][D][voice_assistant:624]: Event Type: 12
[19:59:52.220][D][voice_assistant:831]: STT by VAD end
[19:59:52.223][D][voice_assistant:478]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[19:59:52.226][D][voice_assistant:485]: Desired state set to AWAITING_RESPONSE
[19:59:52.236][D][voice_assistant:478]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[19:59:52.239][D][voice_assistant:478]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[19:59:52.244][D][voice_assistant:624]: Event Type: 4
[19:59:52.246][D][voice_assistant:663]: Speech recognised as: "allume la lumière marine"
[19:59:52.251][D][voice_assistant:624]: Event Type: 5
[19:59:52.253][D][voice_assistant:668]: Intent started
[19:59:52.258][D][light:091]: 'Voice Assistant LED' Setting:
[19:59:52.262][D][light:079]:   Brightness: 60%
[19:59:52.266][D][light:115]:   Red: 100%, Green: 100%, Blue: 100%
[19:59:52.276][D][light:142]:   Transition length: 1.0s
[19:59:52.277][D][voice_assistant:624]: Event Type: 6
[19:59:52.280][D][voice_assistant:624]: Event Type: 7
[19:59:52.282][D][voice_assistant:721]: Response: "Allumé"
[19:59:52.286][D][voice_assistant:624]: Event Type: 98
[19:59:52.288][D][voice_assistant:811]: TTS stream start
[19:59:52.293][D][voice_assistant:624]: Event Type: 8
[19:59:52.296][D][voice_assistant:743]: Response URL: "http://192.168.1.32:8123/api/tts_proxy/7-AiWxEbfP-0shZD-y8Wuw.wav"
[19:59:52.299][D][voice_assistant:478]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[19:59:52.304][D][voice_assistant:485]: Desired state set to STREAMING_RESPONSE
[19:59:52.309][D][voice_assistant:624]: Event Type: 2
[19:59:52.311][D][voice_assistant:766]: Assist Pipeline ended
[19:59:52.314][D][light:091]: 'Voice Assistant LED' Setting:
[19:59:52.317][D][light:079]:   Brightness: 60%
[19:59:52.321][D][light:115]:   Red: 100%, Green: 100%, Blue: 100%
[19:59:52.324][D][light:142]:   Transition length: 1.0s
[19:59:52.329][D][light:091]: 'Voice Assistant LED' Setting:
[19:59:52.331][D][light:079]:   Brightness: 30%
[19:59:52.334][D][light:115]:   Red: 100%, Green: 100%, Blue: 100%
[19:59:52.338][D][light:142]:   Transition length: 1.0s
[19:59:52.354][D][i2s_audio.speaker:102]: Starting
[19:59:52.355][D][i2s_audio.speaker:106]: Started
[19:59:52.359][D][ring_buffer:034][speaker_task]: Created ring buffer with size 9600
[19:59:52.841][D][micro_wake_word:358]: Starting wake word detection
[19:59:52.844][D][micro_wake_word:376]: State changed from STOPPED to STARTING
[19:59:52.868][D][micro_wake_word:259]: Inference task has started, attempting to allocate memory for buffers
[19:59:52.874][D][micro_wake_word:264]: Inference task is running
[19:59:52.875][D][micro_wake_word:376]: State changed from STARTING to DETECTING_WAKE_WORD
[19:59:52.886][D][ring_buffer:034][mww]: Created ring buffer with size 3840
[19:59:53.005][D][voice_assistant:624]: Event Type: 99
[19:59:53.010][D][voice_assistant:821]: TTS stream end
[19:59:53.024][D][voice_assistant:334]: End of audio stream received
[19:59:53.024][D][voice_assistant:478]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[19:59:53.024][D][voice_assistant:485]: Desired state set to RESPONSE_FINISHED
[19:59:53.645][D][i2s_audio.speaker:111]: Stopping
[19:59:53.645][D][i2s_audio.speaker:116]: Stopped
[19:59:53.651][D][voice_assistant:375]: Speaker has finished outputting all audio
[19:59:53.656][D][voice_assistant:478]: State changed from RESPONSE_FINISHED to IDLE
[19:59:53.660][D][voice_assistant:485]: Desired state set to IDLE
[20:00:12.838][I][safe_mode:042]: Boot seems successful; resetting boot loop counter
[20:00:12.850][D][esp32.preferences:149]: Writing 1 items: 0 cached, 1 written, 0 failed

Thank you for that.
Bob

Did you search here?

I’ll look at this as soon as possible
Thank you @Arh
Bob