Good evening,
I have the configuration mentioned in the subject.
The microphone works, the actions are executed but it is impossible to get a confirmation on the speakerphone.
Would anyone have a yaml and a cladding that works for this configuration.
SD and Gain of the Max98357A are connected to the GND.
Here’s my ESPHome yaml.
esphome:
name: esp32-psram16-r8-voice-3
friendly_name: Esp32 Psram16-r8 Voice 3
on_boot:
priority: -10
then:
- light.turn_on:
id: status_led
blue: 100%
brightness: 40%
- delay: 1s
- micro_wake_word.start
- delay: 3s
- voice_assistant.start
esp32:
board: esp32-s3-devkitc-1
framework:
type: esp-idf
psram:
mode: octal
speed: 80MHz
#logger:
# level: INFO
logger:
level: VERY_VERBOSE
# Enable Home Assistant API
api:
encryption:
key: "+ez3sbE7GwM6JTDj/orcPFMtCzcK3H2dW3z2Iq3JVQ2="
on_client_connected:
then:
- delay: 50ms
- light.turn_off: status_led
- micro_wake_word.start:
on_client_disconnected:
then:
- voice_assistant.stop:
ota:
- platform: esphome
password: "36c193329a9d3621cc4a0704724316a7"
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
manual_ip:
static_ip: 192.168.1.15
gateway: 192.168.1.254
subnet: 255.255.255.0
# Enable fallback hotspot (captive portal) in case wifi connection fails
ap:
ssid: "Esp32-Psram16-R8-Voice-3"
password: "IHAMRbks4Y7s"
captive_portal:
button:
- platform: restart
id: reboot
name: "Reboot V3"
# =====================
# LED RGB (statuts)
# =====================
light:
- platform: esp32_rmt_led_strip
id: status_led
name: "Voice Assistant LED"
pin: GPIO48
num_leds: 1
chipset: ws2812
rgb_order: GRB
effects:
- pulse:
- pulse:
name: fast_pulse
transition_length: 0.4s
update_interval: 0.4s
# =====================
# I2S BUS
# =====================
i2s_audio:
- id: i2s_in
i2s_lrclk_pin: GPIO3
i2s_bclk_pin: GPIO2
- id: i2s_out
i2s_lrclk_pin: GPIO6
i2s_bclk_pin: GPIO7
#i2s_lrclk_pin: GPIO15
#i2s_bclk_pin: GPIO16
# =====================
# MICROPHONE
# =====================
microphone:
- platform: i2s_audio
id: mic
i2s_audio_id: i2s_in
i2s_din_pin: GPIO4
adc_type: external
bits_per_sample: 32bit
channel: left # INMP441 mono
# =====================
# SPEAKER
# =====================
speaker:
- platform: i2s_audio
id: speaker3
i2s_audio_id: i2s_out
i2s_dout_pin: GPIO8 #GPIO17
dac_type: external
bits_per_sample: 16bit
sample_rate: 16000
use_apll: true # 🔴 CRITIQUE POUR MAX98357A + ESP32-S3
buffer_duration: 300ms
# =====================
# WAKE WORD
# =====================
micro_wake_word:
models:
- model: okay_nabu
on_wake_word_detected:
- light.turn_on:
id: status_led
green: 100%
brightness: 60%
effect: fast_pulse
- voice_assistant.start:
wake_word: !lambda return wake_word;
silence_detection: true
# =====================
# VOICE ASSISTANT
# =====================
voice_assistant:
id: va
microphone: mic
speaker: speaker3
auto_gain: 31dBFS
noise_suppression_level: 2
volume_multiplier: 8.0
on_listening:
- light.turn_on:
id: status_led
green: 100%
brightness: 60%
on_stt_end:
- light.turn_on:
id: status_led
blue: 100%
brightness: 60%
on_tts_start:
- light.turn_on:
id: status_led
blue: 100%
brightness: 60%
on_end:
- light.turn_on:
id: status_led
blue: 100%
brightness: 30%
- delay: 500ms
- micro_wake_word.start
EDIT - LOG:
[19:59:16.743][C][micro_wake_word:066]: models:
[19:59:16.748][C][micro_wake_word:014]: - Wake Word: Okay Nabu
[19:59:16.748][C][micro_wake_word:014]: Probability cutoff: 0.97
[19:59:16.748][C][micro_wake_word:014]: Sliding window size: 5
[19:59:16.985][E][voice_assistant:542]: No API client connected
[19:59:16.985][D][voice_assistant:478]: State changed from IDLE to IDLE
[19:59:16.985][D][voice_assistant:485]: Desired state set to IDLE
[19:59:17.198][D][api:161]: Accept 192.168.1.32
[19:59:17.251][D][api.connection:1386]: Home Assistant 2025.12.3 (192.168.1.32) connected
[19:59:17.310][D][light:091]: 'Voice Assistant LED' Setting:
[19:59:17.310][D][light:142]: Transition length: 1.0s
[19:59:17.316][W][micro_wake_word:354]: Wake word detection is already running
[19:59:47.849][D][micro_wake_word:323]: Detected 'Okay Nabu' with sliding average probability is 0.98 and max probability is 1.00
[19:59:47.849][D][light:091]: 'Voice Assistant LED' Setting:
[19:59:47.852][D][light:104]: State: ON
[19:59:47.855][D][light:079]: Brightness: 60%
[19:59:47.860][D][light:115]: Red: 100%, Green: 100%, Blue: 100%
[19:59:47.860][D][light:165]: Effect: 'fast_pulse'
[19:59:47.864][D][voice_assistant:478]: State changed from IDLE to START_MICROPHONE
[19:59:47.867][D][voice_assistant:485]: Desired state set to START_PIPELINE
[19:59:47.869][D][micro_wake_word:368]: Stopping wake word detection
[19:59:47.873][D][voice_assistant:207]: Starting Microphone
[19:59:47.876][D][ring_buffer:034]: Created ring buffer with size 16384
[19:59:47.881][D][voice_assistant:478]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[19:59:47.886][D][micro_wake_word:376]: State changed from DETECTING_WAKE_WORD to STOPPING
[19:59:47.893][D][voice_assistant:478]: State changed from STARTING_MICROPHONE to START_PIPELINE
[19:59:47.908][D][voice_assistant:228]: Requesting start
[19:59:47.911][D][voice_assistant:478]: State changed from START_PIPELINE to STARTING_PIPELINE
[19:59:47.915][D][micro_wake_word:271]: Inference task is stopping, deallocating buffers
[19:59:47.918][D][micro_wake_word:276]: Inference task is finished, freeing task resources
[19:59:47.921][D][micro_wake_word:376]: State changed from STOPPING to STOPPED
[19:59:47.927][D][voice_assistant:500]: Client started, streaming microphone
[19:59:47.932][D][voice_assistant:478]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[19:59:47.932][D][voice_assistant:485]: Desired state set to STREAMING_MICROPHONE
[19:59:47.935][D][voice_assistant:624]: Event Type: 1
[19:59:47.938][D][voice_assistant:627]: Assist Pipeline running
[19:59:47.941][D][voice_assistant:624]: Event Type: 3
[19:59:47.944][D][voice_assistant:646]: STT started
[19:59:47.948][D][light:091]: 'Voice Assistant LED' Setting:
[19:59:47.951][D][light:079]: Brightness: 60%
[19:59:47.956][D][light:115]: Red: 100%, Green: 100%, Blue: 100%
[19:59:47.958][D][light:142]: Transition length: 1.0s
[19:59:50.156][D][voice_assistant:624]: Event Type: 11
[19:59:50.157][D][voice_assistant:827]: Starting STT by VAD
[19:59:52.218][D][voice_assistant:624]: Event Type: 12
[19:59:52.220][D][voice_assistant:831]: STT by VAD end
[19:59:52.223][D][voice_assistant:478]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[19:59:52.226][D][voice_assistant:485]: Desired state set to AWAITING_RESPONSE
[19:59:52.236][D][voice_assistant:478]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[19:59:52.239][D][voice_assistant:478]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[19:59:52.244][D][voice_assistant:624]: Event Type: 4
[19:59:52.246][D][voice_assistant:663]: Speech recognised as: "allume la lumière marine"
[19:59:52.251][D][voice_assistant:624]: Event Type: 5
[19:59:52.253][D][voice_assistant:668]: Intent started
[19:59:52.258][D][light:091]: 'Voice Assistant LED' Setting:
[19:59:52.262][D][light:079]: Brightness: 60%
[19:59:52.266][D][light:115]: Red: 100%, Green: 100%, Blue: 100%
[19:59:52.276][D][light:142]: Transition length: 1.0s
[19:59:52.277][D][voice_assistant:624]: Event Type: 6
[19:59:52.280][D][voice_assistant:624]: Event Type: 7
[19:59:52.282][D][voice_assistant:721]: Response: "Allumé"
[19:59:52.286][D][voice_assistant:624]: Event Type: 98
[19:59:52.288][D][voice_assistant:811]: TTS stream start
[19:59:52.293][D][voice_assistant:624]: Event Type: 8
[19:59:52.296][D][voice_assistant:743]: Response URL: "http://192.168.1.32:8123/api/tts_proxy/7-AiWxEbfP-0shZD-y8Wuw.wav"
[19:59:52.299][D][voice_assistant:478]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[19:59:52.304][D][voice_assistant:485]: Desired state set to STREAMING_RESPONSE
[19:59:52.309][D][voice_assistant:624]: Event Type: 2
[19:59:52.311][D][voice_assistant:766]: Assist Pipeline ended
[19:59:52.314][D][light:091]: 'Voice Assistant LED' Setting:
[19:59:52.317][D][light:079]: Brightness: 60%
[19:59:52.321][D][light:115]: Red: 100%, Green: 100%, Blue: 100%
[19:59:52.324][D][light:142]: Transition length: 1.0s
[19:59:52.329][D][light:091]: 'Voice Assistant LED' Setting:
[19:59:52.331][D][light:079]: Brightness: 30%
[19:59:52.334][D][light:115]: Red: 100%, Green: 100%, Blue: 100%
[19:59:52.338][D][light:142]: Transition length: 1.0s
[19:59:52.354][D][i2s_audio.speaker:102]: Starting
[19:59:52.355][D][i2s_audio.speaker:106]: Started
[19:59:52.359][D][ring_buffer:034][speaker_task]: Created ring buffer with size 9600
[19:59:52.841][D][micro_wake_word:358]: Starting wake word detection
[19:59:52.844][D][micro_wake_word:376]: State changed from STOPPED to STARTING
[19:59:52.868][D][micro_wake_word:259]: Inference task has started, attempting to allocate memory for buffers
[19:59:52.874][D][micro_wake_word:264]: Inference task is running
[19:59:52.875][D][micro_wake_word:376]: State changed from STARTING to DETECTING_WAKE_WORD
[19:59:52.886][D][ring_buffer:034][mww]: Created ring buffer with size 3840
[19:59:53.005][D][voice_assistant:624]: Event Type: 99
[19:59:53.010][D][voice_assistant:821]: TTS stream end
[19:59:53.024][D][voice_assistant:334]: End of audio stream received
[19:59:53.024][D][voice_assistant:478]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[19:59:53.024][D][voice_assistant:485]: Desired state set to RESPONSE_FINISHED
[19:59:53.645][D][i2s_audio.speaker:111]: Stopping
[19:59:53.645][D][i2s_audio.speaker:116]: Stopped
[19:59:53.651][D][voice_assistant:375]: Speaker has finished outputting all audio
[19:59:53.656][D][voice_assistant:478]: State changed from RESPONSE_FINISHED to IDLE
[19:59:53.660][D][voice_assistant:485]: Desired state set to IDLE
[20:00:12.838][I][safe_mode:042]: Boot seems successful; resetting boot loop counter
[20:00:12.850][D][esp32.preferences:149]: Writing 1 items: 0 cached, 1 written, 0 failed
Thank you for that.
Bob