VoiceAssistant reboot

I am a newbie on Esp32-EspHome in Home Assistant but not on programming. I’m trying to build a voice assistant for my daughter to tell her stories (The idea is to 3d print a robot styled body once it works). Following a tutorial on Youtube i’ve copied a bit/written this. By the logs it responds to my answer “raccontami una storia/tell me a story” but the first time after boot up it read the story, when it finishes I ask it again but I recieve only a sound like a POP and it reboots; Sometimes it reboots immedately at the first attempt after the first answer

The board is a ESP32-S3-N16R8, i’m using a INMP441 microphone and a max98357a, I have varous microphone, 3 max module and 2 ESP32-S3 and i’ve tried various combination but with the same result. I/ve alredy tried to decrease the sample rate from 48000, to 32000 and then 16000, tried with FLAC and MP3, and tried even to remove this part of code.

sdkconfig_options:
  CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
  CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
  CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
  CONFIG_AUDIO_BOARD_CUSTOM: "y"

But nothing changed.
I believe is something related to the RAM of the esp32 or the sdkconfiguraion part but i’m not sure.

esphome:
  name: robot
  friendly_name: Robot
  platformio_options:
    board_build.flash_mode: dio
  on_boot:
    - light.turn_on:
        id: led_ww
        blue: 100%
        brightness: 60%
        effect: fast pulse

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf

    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
      CONFIG_AUDIO_BOARD_CUSTOM: "y"
   
psram:
  mode: octal # Please change this to quad for N8R2 and octal for N16R8
  speed: 80MHz

# Enable logging
logger:
  level: VERBOSE
  # hardware_uart: UART0

api:
  encryption:
    key: "XXXXXXXXXXXXXXXXXXXXXXXXX"
  on_client_connected:
        then:
          - delay: 50ms
          - light.turn_off: led_ww
          - micro_wake_word.start:
  on_client_disconnected:
        then:
          - voice_assistant.stop: 



ota:
  - platform: esphome

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Robot Fallback Hotspot"
    password: "lua-robot"

captive_portal:


button:
  - platform: restart
    name: "Restart"
    id: but_rest

switch:
  - platform: template
    id: mute
    name: mute
    optimistic: true
    on_turn_on: 
      - micro_wake_word.stop:
      - voice_assistant.stop:
      - light.turn_on:
          id: led_ww           
          red: 100%
          green: 0%
          blue: 0%
          brightness: 60%

      - delay: 2s
      - light.turn_off:
          id: led_ww

      - light.turn_on:
          id: led_ww          
          red: 100%
          green: 0%
          blue: 0%
          brightness: 30%

    on_turn_off:
      - micro_wake_word.start:
      - light.turn_on:
          id: led_ww           
          red: 0%
          green: 100%
          blue: 0%
          brightness: 60%
      - delay: 2s
      - light.turn_off:
          id: led_ww
  - platform: template
    id: timer_ringing
    optimistic: true
    internal: False
    name: "Timer Ringing"
    restore_mode: ALWAYS_OFF


light:
  - platform: esp32_rmt_led_strip
    id: led_ww
    rgb_order: GRB
    pin: GPIO48
    num_leds: 1
    rmt_symbols: 96
    chipset: ws2812
    name: "On board light"
    effects:
      - pulse:
      - pulse:
          name: "Fast Pulse"
          transition_length: 0.5s
          update_interval: 0.5s
          min_brightness: 0%
          max_brightness: 100%
          
          
 # Audio and Voice Assistant Config          
i2s_audio:
  - id: i2s_in # For microphone
    i2s_lrclk_pin: GPIO9  #WS 
    i2s_bclk_pin: GPIO2 #SCK

  - id: i2s_speaker #For Speaker
    i2s_lrclk_pin: GPIO6  #LRC 
    i2s_bclk_pin: GPIO7 #BLCK

microphone:
  - platform: i2s_audio
    id: va_mic
    adc_type: external
    i2s_din_pin: GPIO4 #SD
    channel: left
    pdm: false
    i2s_audio_id: i2s_in
    bits_per_sample: 16bit
    
speaker:
  - platform: i2s_audio
    id: i2s_audio_speaker
    sample_rate: 16000
    bits_per_sample: 16bit
    i2s_audio_id: i2s_speaker
    i2s_dout_pin: GPIO8   #  DIN Pin of the MAX98357A Audio Amplifier
    dac_type: external
    channel: stereo
    timeout: never
    buffer_duration: 100ms



media_player:
  - platform: speaker
    id: external_media_player
    name: Media Player
    internal: False
    volume_increment: 0.05
    volume_min: 0.4
    volume_max: 1
    announcement_pipeline:
      speaker: i2s_audio_speaker
      format: MP3     # FLAC is the least processor intensive codec
      num_channels: 1  # Stereo audio is unnecessary for announcements
      sample_rate: 16000
    files:
      - id: timer_finished_sound
        file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/timer_finished.flac
      

micro_wake_word:
  on_wake_word_detected:
    
    - voice_assistant.start:
        wake_word: !lambda return wake_word;
        silence_detection: true
    - light.turn_on:
        id: led_ww           
        red: 30%
        green: 30%
        blue: 70%
        brightness: 60%
  models:
    - model: alexa
    
voice_assistant:
  id: va
  microphone: va_mic
  auto_gain: 31dBFS
  noise_suppression_level: 2
  volume_multiplier: 4.0
  media_player: external_media_player
  on_stt_end:
       then: 
         - light.turn_off: led_ww
  on_error:
          - micro_wake_word.start:  
  on_end:
        then:
          - light.turn_off: led_ww
          - wait_until:
              not:
                voice_assistant.is_running:
          - micro_wake_word.start: 
  
  
  on_timer_finished:
    - micro_wake_word.stop:
    - voice_assistant.stop:
    - switch.turn_on: timer_ringing
    - wait_until:
        not:
          microphone.is_capturing:
    
    - wait_until:
        not:
          micro_wake_word.is_running:
    
    - media_player.speaker.play_on_device_media_file:
          media_file: timer_finished_sound
    - micro_wake_word.start:
    - wait_until:
        and:
          - micro_wake_word.is_running:
    #       - microphone.is_capturing:
    - while:
        condition:
          switch.is_on: timer_ringing
        then:
          - media_player.speaker.play_on_device_media_file:
              media_file: timer_finished_sound
          - delay: 2s
    - wait_until:
        not:
          speaker.is_playing:
    
    - micro_wake_word.start:

The log

INFO ESPHome 2025.8.1
INFO Reading configuration /config/esphome/spk-test-3.yaml...
INFO Starting log output from 192.168.178.32 using esphome API
INFO Successfully resolved robot @ 192.168.178.32 in 0.000s
INFO Successfully connected to robot @ 192.168.178.32 in 0.065s
INFO Successful handshake with robot @ 192.168.178.32 in 0.072s
[19:13:22][I][app:200]: ESPHome version 2025.8.1 compiled on Aug 29 2025, 18:39:17
[19:13:22][C][wifi:659]: WiFi:
[19:13:22][C][wifi:442]: Local MAC: XXXXXXXXXXXX
[19:13:22][C][wifi:447]: SSID: 'XXXXXXXXX'[redacted]
[19:13:22][C][wifi:450]: IP Address: 192.168.178.32
[19:13:22][C][wifi:454]: BSSID: XXXXXXXXX[redacted]
[19:13:22][C][wifi:454]:   Hostname: 'robot'
[19:13:22][C][wifi:454]:   Signal strength: -77 dB ▂▄▆█
[19:13:22][V][wifi:462]:   Priority: -1.0
[19:13:22][C][wifi:465]:   Channel: 1
[19:13:22][C][wifi:465]:   Subnet: 255.255.255.0
[19:13:22][C][wifi:465]:   Gateway: 192.168.178.1
[19:13:22][C][wifi:465]:   DNS1: 192.168.178.1
[19:13:22][C][wifi:465]:   DNS2: 0.0.0.0
[19:13:22][D][light:084]: 'On board light' Setting:
[19:13:22][D][light:135]:   Transition length: 1.0s
[19:13:22][W][micro_wake_word:353]: Wake word detection is already running
[19:13:22][C][logger:252]: Logger:
[19:13:22][C][logger:252]:   Max Level: VERBOSE
[19:13:22][C][logger:252]:   Initial Level: VERBOSE
[19:13:22][C][logger:258]:   Log Baud Rate: 115200
[19:13:22][C][logger:258]:   Hardware UART: USB_SERIAL_JTAG
[19:13:22][C][logger:265]:   Task Log Buffer Size: 768
[19:13:22][C][esp32_rmt_led_strip:263]: ESP32 RMT LED Strip:
[19:13:22][C][esp32_rmt_led_strip:263]:   Pin: 48
[19:13:22][C][esp32_rmt_led_strip:267]:   RMT Symbols: 96
[19:13:22][C][esp32_rmt_led_strip:292]:   RGB Order: GRB
[19:13:22][C][esp32_rmt_led_strip:292]:   Max refresh rate: 0
[19:13:22][C][esp32_rmt_led_strip:292]:   Number of LEDs: 1
[19:13:22][C][light:088]: Light 'On board light'
[19:13:22][C][light:091]:   Default Transition Length: 1.0s
[19:13:22][C][light:091]:   Gamma Correct: 2.80
[19:13:22][C][template.switch:087]: Template Switch 'Timer Ringing'
[19:13:22][C][template.switch:087]:   Restore Mode: always OFF
[19:13:22][C][template.switch:057]:   Optimistic: YES
[19:13:22][C][template.switch:087]: Template Switch 'mute'
[19:13:22][C][template.switch:087]:   Restore Mode: always OFF
[19:13:22][C][template.switch:057]:   Optimistic: YES
[19:13:22][C][psram:016]: PSRAM:
[19:13:22][C][psram:019]:   Available: YES
[19:13:22][C][psram:021]:   Size: 8192 KB
[19:13:22][C][restart.button:017]: Restart Button 'Restart'
[19:13:22][C][restart.button:017]:   Icon: 'mdi:restart'
[19:13:22][C][i2s_audio.microphone:079]: Microphone:
[19:13:22][C][i2s_audio.microphone:079]:   Pin: 4
[19:13:22][C][i2s_audio.microphone:079]:   PDM: NO
[19:13:22][C][i2s_audio.microphone:079]:   DC offset correction: NO
[19:13:22][C][i2s_audio.speaker:074]: Speaker:
[19:13:22][C][i2s_audio.speaker:074]:   Pin: 8
[19:13:22][C][i2s_audio.speaker:074]:   Buffer duration: 100
[19:13:22][C][i2s_audio.speaker:088]:   Communication format: std
[19:13:22][C][captive_portal:099]: Captive Portal:
[19:13:22][C][esphome.ota:075]: Over-The-Air updates:
[19:13:22][C][esphome.ota:075]:   Address: robot.local:3232
[19:13:22][C][esphome.ota:075]:   Version: 2
[19:13:22][C][safe_mode:018]: Safe Mode:
[19:13:22][C][safe_mode:019]:   Boot considered successful after 60 seconds
[19:13:22][C][safe_mode:019]:   Invoke after 10 boot attempts
[19:13:22][C][safe_mode:019]:   Remain for 300 seconds
[19:13:22][C][web_server.ota:224]: Web Server OTA
[19:13:23][C][api:205]: Server:
[19:13:23][C][api:205]:   Address: robot.local:6053
[19:13:23][C][api:210]:   Noise encryption: YES
[19:13:23][C][mdns:124]: mDNS:
[19:13:23][C][mdns:124]:   Hostname: robot
[19:13:23][V][mdns:128]:   Services:
[19:13:23][V][mdns:130]:   - _esphomelib, _tcp, 6053
[19:13:23][V][mdns:133]:     TXT: friendly_name = Robot
[19:13:23][V][mdns:133]:     TXT: version = 2025.8.1
[19:13:23][V][mdns:133]:     TXT: mac = 1020ba4bb8d4
[19:13:23][V][mdns:133]:     TXT: platform = ESP32
[19:13:23][V][mdns:133]:     TXT: board = esp32-s3-devkitc-1
[19:13:23][V][mdns:133]:     TXT: network = wifi
[19:13:23][V][mdns:133]:     TXT: api_encryption = Noise_NNpsk0_25519_ChaChaPoly_SHA256
[19:13:23][C][micro_wake_word:064]: microWakeWord:
[19:13:23][C][micro_wake_word:065]:   models:
[19:13:23][C][micro_wake_word:014]:     - Wake Word: Alexa
[19:13:23][C][micro_wake_word:014]:       Probability cutoff: 0.90
[19:13:23][C][micro_wake_word:014]:       Sliding window size: 5
[19:13:27][D][micro_wake_word:322]: Detected 'Alexa' with sliding average probability is 0.91 and max probability is 0.94
[19:13:27][D][voice_assistant:478]: State changed from IDLE to START_MICROPHONE
[19:13:27][D][voice_assistant:485]: Desired state set to START_PIPELINE
[19:13:27][D][light:084]: 'On board light' Setting:
[19:13:27][D][light:097]:   State: ON
[19:13:27][D][light:072]:   Brightness: 60%
[19:13:27][D][light:108]:   Red: 43%, Green: 43%, Blue: 100%
[19:13:27][D][light:135]:   Transition length: 1.0s
[19:13:27][D][micro_wake_word:367]: Stopping wake word detection
[19:13:27][D][voice_assistant:207]: Starting Microphone
[19:13:27][D][ring_buffer:034]: Created ring buffer with size 16384
[19:13:27][D][voice_assistant:478]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[19:13:27][D][micro_wake_word:375]: State changed from DETECTING_WAKE_WORD to STOPPING
[19:13:27][D][voice_assistant:478]: State changed from STARTING_MICROPHONE to START_PIPELINE
[19:13:27][D][micro_wake_word:270]: Inference task is stopping, deallocating buffers
[19:13:27][D][micro_wake_word:275]: Inference task is finished, freeing task resources
[19:13:27][D][micro_wake_word:375]: State changed from STOPPING to STOPPED
[19:13:27][D][voice_assistant:228]: Requesting start
[19:13:27][D][voice_assistant:478]: State changed from START_PIPELINE to STARTING_PIPELINE
[19:13:27][D][voice_assistant:500]: Client started, streaming microphone
[19:13:27][D][voice_assistant:478]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[19:13:27][D][voice_assistant:485]: Desired state set to STREAMING_MICROPHONE
[19:13:27][D][voice_assistant:624]: Event Type: 1
[19:13:27][D][voice_assistant:627]: Assist Pipeline running
[19:13:27][D][voice_assistant:624]: Event Type: 3
[19:13:27][D][voice_assistant:646]: STT started
[19:13:29][D][voice_assistant:624]: Event Type: 11
[19:13:29][D][voice_assistant:825]: Starting STT by VAD
[19:13:31][D][voice_assistant:624]: Event Type: 12
[19:13:31][D][voice_assistant:829]: STT by VAD end
[19:13:31][D][voice_assistant:478]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[19:13:31][D][voice_assistant:485]: Desired state set to AWAITING_RESPONSE
[19:13:31][D][voice_assistant:478]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[19:13:31][D][voice_assistant:478]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[19:13:31][V][i2s_audio.microphone:486]: Task finished, freeing resources and uninstalling driver
[19:13:36][D][voice_assistant:624]: Event Type: 4
[19:13:36][D][voice_assistant:662]: Speech recognised as: " che ora sono"
[19:13:36][D][voice_assistant:624]: Event Type: 5
[19:13:36][D][voice_assistant:667]: Intent started
[19:13:36][D][light:084]: 'On board light' Setting:
[19:13:36][D][light:097]:   State: OFF
[19:13:36][D][light:135]:   Transition length: 1.0s
[19:13:36][D][voice_assistant:624]: Event Type: 6
[19:13:36][D][voice_assistant:624]: Event Type: 7
[19:13:36][D][voice_assistant:719]: Response: "Sono le 19:13."
[19:13:36][D][voice_assistant:624]: Event Type: 8
[19:13:36][D][voice_assistant:741]: Response URL: "http://192.168.178.179:8123/api/tts_proxy/tdQc9Wz3-RxxiPdmDTNpnw.mp3"
[19:13:36][D][voice_assistant:478]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[19:13:36][D][voice_assistant:485]: Desired state set to STREAMING_RESPONSE
[19:13:36][D][voice_assistant:624]: Event Type: 2
[19:13:36][D][voice_assistant:764]: Assist Pipeline ended
[19:13:36][D][media_player:083]: 'Media Player' - Setting
[19:13:36][D][media_player:090]:   Media URL: http://192.168.178.179:8123/api/tts_proxy/tdQc9Wz3-RxxiPdmDTNpnw.mp3
[19:13:36][D][media_player:096]:  Announcement: yes
[19:13:36][D][light:084]: 'On board light' Setting:
[19:13:36][D][light:135]:   Transition length: 1.0s
[19:13:36][D][speaker_media_player:406]: State changed to ANNOUNCING
[19:13:36][D][speaker_media_player.pipeline:114]: Reading MP3 file type
[19:13:36][D][ring_buffer:034][ann_read]: Created ring buffer with size 1000000
[19:13:36][D][speaker_media_player.pipeline:124]: Decoded audio has 1 channels, 16000 Hz sample rate, and 16 bits per sample
[19:13:36][D][i2s_audio.speaker:102]: Starting
[19:13:36][D][i2s_audio.speaker:106]: Started
[19:13:36][D][ring_buffer:034][speaker_task]: Created ring buffer with size 3200
[19:13:39][D][i2s_audio.speaker:111]: Stopping
[19:13:39][D][i2s_audio.speaker:116]: Stopped
[19:13:39][D][speaker_media_player:406]: State changed to IDLE
[19:13:39][D][voice_assistant:351]: Announcement finished playing
[19:13:39][D][voice_assistant:478]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[19:13:39][D][voice_assistant:485]: Desired state set to RESPONSE_FINISHED
[19:13:39][D][voice_assistant:478]: State changed from RESPONSE_FINISHED to IDLE
[19:13:39][D][voice_assistant:485]: Desired state set to IDLE
[19:13:39][D][micro_wake_word:357]: Starting wake word detection
[19:13:39][D][micro_wake_word:375]: State changed from STOPPED to STARTING
[19:13:39][D][micro_wake_word:258]: Inference task has started, attempting to allocate memory for buffers
[19:13:39][D][micro_wake_word:263]: Inference task is running
[19:13:39][D][micro_wake_word:375]: State changed from STARTING to DETECTING_WAKE_WORD
[19:13:39][D][ring_buffer:034][mww]: Created ring buffer with size 3840
[19:13:39][V][i2s_audio.microphone:474]: Task started, attempting to allocate buffer
[19:13:39][V][i2s_audio.microphone:479]: Task is running and reading data
[19:13:49][D][micro_wake_word:322]: Detected 'Alexa' with sliding average probability is 0.99 and max probability is 1.00
[19:13:49][D][voice_assistant:478]: State changed from IDLE to START_MICROPHONE
[19:13:49][D][voice_assistant:485]: Desired state set to START_PIPELINE
[19:13:49][D][light:084]: 'On board light' Setting:
[19:13:49][D][light:097]:   State: ON
[19:13:49][D][light:072]:   Brightness: 60%
[19:13:49][D][light:108]:   Red: 43%, Green: 43%, Blue: 100%
[19:13:49][D][light:135]:   Transition length: 1.0s
[19:13:49][D][micro_wake_word:367]: Stopping wake word detection
[19:13:49][D][voice_assistant:207]: Starting Microphone
[19:13:49][D][ring_buffer:034]: Created ring buffer with size 16384
[19:13:49][D][voice_assistant:478]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[19:13:49][D][micro_wake_word:375]: State changed from DETECTING_WAKE_WORD to STOPPING
[19:13:49][D][voice_assistant:478]: State changed from STARTING_MICROPHONE to START_PIPELINE
[19:13:49][D][micro_wake_word:270]: Inference task is stopping, deallocating buffers
[19:13:49][D][micro_wake_word:275]: Inference task is finished, freeing task resources
[19:13:49][D][micro_wake_word:375]: State changed from STOPPING to STOPPED
[19:13:49][D][voice_assistant:228]: Requesting start
[19:13:49][D][voice_assistant:478]: State changed from START_PIPELINE to STARTING_PIPELINE
[19:13:49][D][voice_assistant:500]: Client started, streaming microphone
[19:13:49][D][voice_assistant:478]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[19:13:49][D][voice_assistant:485]: Desired state set to STREAMING_MICROPHONE
[19:13:49][D][voice_assistant:624]: Event Type: 1
[19:13:49][D][voice_assistant:627]: Assist Pipeline running
[19:13:49][D][voice_assistant:624]: Event Type: 3
[19:13:49][D][voice_assistant:646]: STT started
[19:13:52][D][voice_assistant:624]: Event Type: 11
[19:13:52][D][voice_assistant:825]: Starting STT by VAD
[19:13:54][D][voice_assistant:624]: Event Type: 12
[19:13:54][D][voice_assistant:829]: STT by VAD end
[19:13:54][D][voice_assistant:478]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[19:13:54][D][voice_assistant:485]: Desired state set to AWAITING_RESPONSE
[19:13:54][D][voice_assistant:478]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[19:13:54][D][voice_assistant:478]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[19:13:54][V][i2s_audio.microphone:486]: Task finished, freeing resources and uninstalling driver
[19:13:56][D][voice_assistant:624]: Event Type: 4
[19:13:56][D][voice_assistant:662]: Speech recognised as: " Raccuntami una storia"
[19:13:56][D][voice_assistant:624]: Event Type: 5
[19:13:56][D][voice_assistant:667]: Intent started
[19:13:56][D][light:084]: 'On board light' Setting:
[19:13:56][D][light:097]:   State: OFF
[19:13:56][D][light:135]:   Transition length: 1.0s
[19:13:57][D][voice_assistant:624]: Event Type: 6
[19:13:57][D][voice_assistant:624]: Event Type: 7
[19:13:57][D][voice_assistant:719]: Response: "Certo! Allora, ti racconto una storia.

[19:13:57]In un piccolo villaggio di pescatori, affacciato su un mare scintillante, viveva una bambina di nome Sofia. Ogni giorno, al sorgere del sole, correva sulla spiaggia a raccogliere conchiglie. Sofia sognava di trovare una conchiglia magica che potesse esaudire un desiderio."
[19:13:57][D][voice_assistant:624]: Event Type: 8
[19:13:57][D][voice_assistant:741]: Response URL: "http://192.168.178.179:8123/api/tts_proxy/FNBkq3yEjd7i68HWLpkKtA.mp3"
[19:13:57][D][voice_assistant:478]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[19:13:57][D][voice_assistant:485]: Desired state set to STREAMING_RESPONSE
[19:13:57][D][voice_assistant:624]: Event Type: 2
[19:13:57][D][voice_assistant:764]: Assist Pipeline ended
[19:13:57][D][media_player:083]: 'Media Player' - Setting
[19:13:57][D][media_player:090]:   Media URL: http://192.168.178.179:8123/api/tts_proxy/FNBkq3yEjd7i68HWLpkKtA.mp3
[19:13:57][D][media_player:096]:  Announcement: yes
[19:13:57][D][light:084]: 'On board light' Setting:
[19:13:57][D][light:135]:   Transition length: 1.0s
[19:13:57][D][speaker_media_player:406]: State changed to ANNOUNCING
[19:13:57][D][speaker_media_player.pipeline:114]: Reading MP3 file type
[19:13:57][D][ring_buffer:034][ann_read]: Created ring buffer with size 1000000
[19:13:57][D][speaker_media_player.pipeline:124]: Decoded audio has 1 channels, 16000 Hz sample rate, and 16 bits per sample
[19:13:57][D][i2s_audio.speaker:102]: Starting
[19:13:57][D][i2s_audio.speaker:106]: Started
[19:13:57][D][ring_buffer:034][speaker_task]: Created ring buffer with size 3200
INFO Processing unexpected disconnect from ESPHome API for robot @ 192.168.178.32
WARNING Disconnected from API
INFO Successfully resolved robot @ 192.168.178.32 in 0.000s
INFO Successfully connected to robot @ 192.168.178.32 in 0.006s
INFO Successful handshake with robot @ 192.168.178.32 in 0.058s
[19:14:22][D][light:084]: 'On board light' Setting:
[19:14:22][D][light:135]:   Transition length: 1.0s
[19:14:22][D][micro_wake_word:357]: Starting wake word detection
[19:14:22][D][micro_wake_word:375]: State changed from STOPPED to STARTING
[19:14:22][D][micro_wake_word:258]: Inference task has started, attempting to allocate memory for buffers
[19:14:22][D][micro_wake_word:263]: Inference task is running
[19:14:22][D][micro_wake_word:375]: State changed from STARTING to DETECTING_WAKE_WORD
[19:14:22][D][ring_buffer:034][mww]: Created ring buffer with size 3840
[19:14:22][V][i2s_audio.microphone:474]: Task started, attempting to allocate buffer
[19:14:22][V][i2s_audio.microphone:479]: Task is running and reading data
[19:14:27][D][api:144]: Accept 192.168.178.179
[19:14:27][V][api.connection:1357]: Hello from client: 'Home Assistant 2025.8.1' | 192.168.178.179 | API Version 1.12
[19:14:27][D][api.connection:1341]: Home Assistant 2025.8.1 (192.168.178.179) connected
[19:14:27][D][light:084]: 'On board light' Setting:
[19:14:27][D][light:135]:   Transition length: 1.0s
[19:14:27][W][micro_wake_word:353]: Wake word detection is already running
[19:14:56][D][micro_wake_word:322]: Detected 'Alexa' with sliding average probability is 0.96 and max probability is 1.00
[19:14:56][D][voice_assistant:478]: State changed from IDLE to START_MICROPHONE
[19:14:56][D][voice_assistant:485]: Desired state set to START_PIPELINE
[19:14:56][D][light:084]: 'On board light' Setting:
[19:14:56][D][light:097]:   State: ON
[19:14:56][D][light:072]:   Brightness: 60%
[19:14:56][D][light:108]:   Red: 43%, Green: 43%, Blue: 100%
[19:14:56][D][light:135]:   Transition length: 1.0s
[19:14:56][D][micro_wake_word:367]: Stopping wake word detection
[19:14:56][D][voice_assistant:207]: Starting Microphone
[19:14:56][D][ring_buffer:034]: Created ring buffer with size 16384
[19:14:56][D][voice_assistant:478]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[19:14:56][D][micro_wake_word:375]: State changed from DETECTING_WAKE_WORD to STOPPING
[19:14:56][D][voice_assistant:478]: State changed from STARTING_MICROPHONE to START_PIPELINE
[19:14:56][D][micro_wake_word:270]: Inference task is stopping, deallocating buffers
[19:14:56][D][micro_wake_word:275]: Inference task is finished, freeing task resources
[19:14:56][D][micro_wake_word:375]: State changed from STOPPING to STOPPED
[19:14:56][D][voice_assistant:228]: Requesting start
[19:14:56][D][voice_assistant:478]: State changed from START_PIPELINE to STARTING_PIPELINE
[19:14:56][D][voice_assistant:500]: Client started, streaming microphone
[19:14:56][D][voice_assistant:478]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE

Log PT2


[19:14:56][D][voice_assistant:485]: Desired state set to STREAMING_MICROPHONE

[19:14:56][D][voice_assistant:624]: Event Type: 1

[19:14:56][D][voice_assistant:627]: Assist Pipeline running

[19:14:56][D][voice_assistant:624]: Event Type: 3

[19:14:56][D][voice_assistant:646]: STT started

[19:14:58][I][safe_mode:042]: Boot seems successful; resetting boot loop counter

[19:14:58][V][esp32.preferences:114]: Saving 1 items...

[19:14:58][V][esp32.preferences:126]: sync: key: 233825507, len: 4

[19:14:58][D][esp32.preferences:142]: Writing 1 items: 0 cached, 1 written, 0 failed

[19:14:59][D][voice_assistant:624]: Event Type: 11

[19:14:59][D][voice_assistant:825]: Starting STT by VAD

[19:15:02][D][voice_assistant:624]: Event Type: 12

[19:15:02][D][voice_assistant:829]: STT by VAD end

[19:15:02][D][voice_assistant:478]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE

[19:15:02][D][voice_assistant:485]: Desired state set to AWAITING_RESPONSE

[19:15:02][D][voice_assistant:478]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE

[19:15:02][D][voice_assistant:478]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE

[19:15:02][V][i2s_audio.microphone:486]: Task finished, freeing resources and uninstalling driver

[19:15:03][D][voice_assistant:624]: Event Type: 4

[19:15:03][D][voice_assistant:662]: Speech recognised as: " in posta un timer di 4 secondi"

[19:15:03][D][voice_assistant:624]: Event Type: 5

[19:15:03][D][voice_assistant:667]: Intent started

[19:15:03][D][light:084]: 'On board light' Setting:

[19:15:03][D][light:097]: State: OFF

[19:15:03][D][light:135]: Transition length: 1.0s

[19:15:05][D][voice_assistant:624]: Event Type: 6

[19:15:05][D][voice_assistant:624]: Event Type: 7

[19:15:05][D][voice_assistant:719]: Response: "Timer di 4 secondi impostato!"

[19:15:05][D][voice_assistant:624]: Event Type: 8

[19:15:05][D][voice_assistant:741]: Response URL: "http://192.168.178.179:8123/api/tts_proxy/kTl7FfpADhWrrinYxGkLNA.mp3"

[19:15:05][D][voice_assistant:478]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE

[19:15:05][D][voice_assistant:485]: Desired state set to STREAMING_RESPONSE

[19:15:05][D][voice_assistant:624]: Event Type: 2

[19:15:05][D][voice_assistant:764]: Assist Pipeline ended

[19:15:05][D][media_player:083]: 'Media Player' - Setting

[19:15:05][D][media_player:090]: Media URL: http://192.168.178.179:8123/api/tts_proxy/kTl7FfpADhWrrinYxGkLNA.mp3

[19:15:05][D][media_player:096]: Announcement: yes

[19:15:05][D][light:084]: 'On board light' Setting:

[19:15:05][D][light:135]: Transition length: 1.0s

[19:15:05][D][speaker_media_player:406]: State changed to ANNOUNCING

[19:15:06][D][speaker_media_player.pipeline:114]: Reading MP3 file type

[19:15:06][D][ring_buffer:034][ann_read]: Created ring buffer with size 1000000

[19:15:06][D][speaker_media_player.pipeline:124]: Decoded audio has 1 channels, 16000 Hz sample rate, and 16 bits per sample

[19:15:06][D][i2s_audio.speaker:102]: Starting

[19:15:06][D][i2s_audio.speaker:106]: Started

[19:15:06][D][ring_buffer:034][speaker_task]: Created ring buffer with size 3200

[19:15:09][D][i2s_audio.speaker:111]: Stopping

[19:15:09][D][i2s_audio.speaker:116]: Stopped

[19:15:09][D][speaker_media_player:406]: State changed to IDLE

[19:15:09][D][voice_assistant:351]: Announcement finished playing

[19:15:09][D][voice_assistant:478]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED

[19:15:09][D][voice_assistant:485]: Desired state set to RESPONSE_FINISHED

[19:15:09][D][voice_assistant:478]: State changed from RESPONSE_FINISHED to IDLE

[19:15:09][D][voice_assistant:485]: Desired state set to IDLE

[19:15:09][D][micro_wake_word:357]: Starting wake word detection

[19:15:09][D][micro_wake_word:375]: State changed from STOPPED to STARTING

INFO Processing unexpected disconnect from ESPHome API for robot @ 192.168.178.32

WARNING Disconnected from API

INFO Successfully resolved robot @ 192.168.178.32 in 0.000s

INFO Successfully connected to robot @ 192.168.178.32 in 0.004s

INFO Successful handshake with robot @ 192.168.178.32 in 0.075s

[19:15:43][D][voice_assistant:228]: Requesting start

[19:15:43][D][voice_assistant:478]: State changed from START_PIPELINE to STARTING_PIPELINE

[19:15:43][D][micro_wake_word:270]: Inference task is stopping, deallocating buffers

[19:15:43][D][micro_wake_word:275]: Inference task is finished, freeing task resources

[19:15:43][D][micro_wake_word:375]: State changed from STOPPING to STOPPED

[19:15:43][D][voice_assistant:500]: Client started, streaming microphone

[19:15:43][D][voice_assistant:478]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE

[19:15:43][D][voice_assistant:485]: Desired state set to STREAMING_MICROPHONE

[19:15:43][D][voice_assistant:624]: Event Type: 1

[19:15:43][D][voice_assistant:627]: Assist Pipeline running

[19:15:43][D][voice_assistant:624]: Event Type: 3

[19:15:43][D][voice_assistant:646]: STT started

[19:15:43][D][light:084]: 'On board light' Setting:

[19:15:43][D][light:097]: State: OFF

[19:15:43][D][light:135]: Transition length: 1.0s

[19:15:43][D][micro_wake_word:357]: Starting wake word detection

[19:15:43][D][micro_wake_word:375]: State changed from STOPPED to STARTING

[19:15:43][D][micro_wake_word:258]: Inference task has started, attempting to allocate memory for buffers

[19:15:43][D][micro_wake_word:263]: Inference task is running

[19:15:43][D][micro_wake_word:375]: State changed from STARTING to DETECTING_WAKE_WORD

[19:15:43][D][ring_buffer:034][mww]: Created ring buffer with size 3840

[19:15:52][D][voice_assistant:624]: Event Type: 11

[19:15:52][D][voice_assistant:825]: Starting STT by VAD

[19:15:54][D][voice_assistant:624]: Event Type: 12

[19:15:54][D][voice_assistant:829]: STT by VAD end

[19:15:54][D][voice_assistant:478]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE

[19:15:54][D][voice_assistant:485]: Desired state set to AWAITING_RESPONSE

[19:15:54][D][voice_assistant:478]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE

[19:15:54][D][voice_assistant:478]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE

[19:15:57][D][voice_assistant:624]: Event Type: 4

[19:15:57][D][voice_assistant:662]: Speech recognised as: " All'exha All'exha"

[19:15:57][D][voice_assistant:624]: Event Type: 5

[19:15:57][D][voice_assistant:667]: Intent started

[19:15:57][D][light:084]: 'On board light' Setting:

[19:15:57][D][light:135]: Transition length: 1.0s

[19:15:57][D][voice_assistant:624]: Event Type: 6

[19:15:57][D][voice_assistant:624]: Event Type: 7

[19:15:57][D][voice_assistant:719]: Response: "Hmm, non ho capito bene. Potresti ripetere la tua richiesta?"

[19:15:57][D][voice_assistant:624]: Event Type: 8

[19:15:57][D][voice_assistant:741]: Response URL: "http://192.168.178.179:8123/api/tts_proxy/r8fUz_SLf-csdVPdNi8wAQ.mp3"

[19:15:57][D][voice_assistant:478]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE

[19:15:57][D][voice_assistant:485]: Desired state set to STREAMING_RESPONSE

[19:15:57][D][voice_assistant:624]: Event Type: 2

[19:15:57][D][voice_assistant:764]: Assist Pipeline ended

[19:15:57][D][media_player:083]: 'Media Player' - Setting

[19:15:57][D][media_player:090]: Media URL: http://192.168.178.179:8123/api/tts_proxy/r8fUz_SLf-csdVPdNi8wAQ.mp3

[19:15:57][D][media_player:096]: Announcement: yes

[19:15:57][D][light:084]: 'On board light' Setting:

[19:15:57][D][light:135]: Transition length: 1.0s

[19:15:57][D][speaker_media_player:406]: State changed to ANNOUNCING

[19:15:57][D][speaker_media_player.pipeline:114]: Reading MP3 file type

[19:15:57][D][ring_buffer:034][ann_read]: Created ring buffer with size 1000000

[19:15:58][D][speaker_media_player.pipeline:124]: Decoded audio has 1 channels, 16000 Hz sample rate, and 16 bits per sample

[19:15:58][D][i2s_audio.speaker:102]: Starting

[19:15:58][D][i2s_audio.speaker:106]: Started

[19:15:58][D][ring_buffer:034][speaker_task]: Created ring buffer with size 3200

INFO Processing unexpected disconnect from ESPHome API for robot @ 192.168.178.32

WARNING Disconnected from API

INFO Successfully resolved robot @ 192.168.178.32 in 0.000s

INFO Successfully connected to robot @ 192.168.178.32 in 0.003s

INFO Successful handshake with robot @ 192.168.178.32 in 0.054s

[19:16:23][D][light:084]: 'On board light' Setting:

[19:16:23][D][light:135]: Transition length: 1.0s

[19:16:23][D][micro_wake_word:357]: Starting wake word detection

[19:16:23][D][micro_wake_word:375]: State changed from STOPPED to STARTING

[19:16:23][D][micro_wake_word:258]: Inference task has started, attempting to allocate memory for buffers

[19:16:23][D][micro_wake_word:263]: Inference task is running

[19:16:23][D][micro_wake_word:375]: State changed from STARTING to DETECTING_WAKE_WORD

[19:16:23][D][ring_buffer:034][mww]: Created ring buffer with size 3840

[19:16:23][V][i2s_audio.microphone:474]: Task started, attempting to allocate buffer

[19:16:23][V][i2s_audio.microphone:479]: Task is running and reading data

[19:16:37][D][api:144]: Accept 192.168.178.179

[19:16:37][V][api.connection:1357]: Hello from client: 'Home Assistant 2025.8.1' | 192.168.178.179 | API Version 1.12

[19:16:37][D][api.connection:1341]: Home Assistant 2025.8.1 (192.168.178.179) connected

[19:16:37][D][light:084]: 'On board light' Setting:

[19:16:37][D][light:135]: Transition length: 1.0s

[19:16:37][W][micro_wake_word:353]: Wake word detection is already running

[19:16:47][D][micro_wake_word:322]: Detected 'Alexa' with sliding average probability is 0.93 and max probability is 0.96

[19:16:47][D][voice_assistant:478]: State changed from IDLE to START_MICROPHONE

[19:16:47][D][voice_assistant:485]: Desired state set to START_PIPELINE

[19:16:47][D][light:084]: 'On board light' Setting:

[19:16:47][D][light:097]: State: ON

[19:16:47][D][light:072]: Brightness: 60%

[19:16:47][D][light:108]: Red: 43%, Green: 43%, Blue: 100%

[19:16:47][D][light:135]: Transition length: 1.0s

[19:16:47][D][micro_wake_word:367]: Stopping wake word detection

[19:16:47][D][voice_assistant:207]: Starting Microphone

[19:16:47][D][ring_buffer:034]: Created ring buffer with size 16384

[19:16:47][D][voice_assistant:478]: State changed from START_MICROPHONE to STARTING_MICROPHONE

[19:16:47][D][micro_wake_word:375]: State changed from DETECTING_WAKE_WORD to STOPPING

[19:16:47][D][voice_assistant:478]: State changed from STARTING_MICROPHONE to START_PIPELINE

[19:16:47][D][voice_assistant:228]: Requesting start

[19:16:47][D][voice_assistant:478]: State changed from START_PIPELINE to STARTING_PIPELINE

[19:16:47][D][micro_wake_word:270]: Inference task is stopping, deallocating buffers

[19:16:47][D][micro_wake_word:275]: Inference task is finished, freeing task resources

[19:16:47][D][micro_wake_word:375]: State changed from STOPPING to STOPPED

[19:16:47][D][voice_assistant:500]: Client started, streaming microphone

[19:16:47][D][voice_assistant:478]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE

[19:16:47][D][voice_assistant:485]: Desired state set to STREAMING_MICROPHONE

[19:16:47][D][voice_assistant:624]: Event Type: 1

[19:16:47][D][voice_assistant:627]: Assist Pipeline running

[19:16:47][D][voice_assistant:624]: Event Type: 3

[19:16:47][D][voice_assistant:646]: STT started

[19:16:51][D][voice_assistant:624]: Event Type: 11

[19:16:51][D][voice_assistant:825]: Starting STT by VAD

[19:16:53][D][voice_assistant:624]: Event Type: 12

[19:16:53][D][voice_assistant:829]: STT by VAD end

[19:16:53][D][voice_assistant:478]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE

[19:16:53][D][voice_assistant:485]: Desired state set to AWAITING_RESPONSE

[19:16:53][D][voice_assistant:478]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE

[19:16:53][D][voice_assistant:478]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE

[19:16:53][V][i2s_audio.microphone:486]: Task finished, freeing resources and uninstalling driver

[19:16:58][D][voice_assistant:624]: Event Type: 4

[19:16:58][D][voice_assistant:662]: Speech recognised as: " che ora è solo"

[19:16:58][D][voice_assistant:624]: Event Type: 5

[19:16:58][D][voice_assistant:667]: Intent started

[19:16:58][D][light:084]: 'On board light' Setting:

[19:16:58][D][light:097]: State: OFF

[19:16:58][D][light:135]: Transition length: 1.0s

[19:16:58][D][voice_assistant:624]: Event Type: 6

[19:16:58][D][voice_assistant:624]: Event Type: 7

[19:16:58][D][voice_assistant:719]: Response: "Sono le 19:16."

[19:16:58][D][voice_assistant:624]: Event Type: 8

[19:16:58][D][voice_assistant:741]: Response URL: "http://192.168.178.179:8123/api/tts_proxy/fq020U8blfcoz-ivIq3edA.mp3"

[19:16:58][D][voice_assistant:478]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE

[19:16:58][D][voice_assistant:485]: Desired state set to STREAMING_RESPONSE

[19:16:58][D][voice_assistant:624]: Event Type: 2

[19:16:58][D][voice_assistant:764]: Assist Pipeline ended

[19:16:58][D][media_player:083]: 'Media Player' - Setting

[19:16:58][D][media_player:090]: Media URL: http://192.168.178.179:8123/api/tts_proxy/fq020U8blfcoz-ivIq3edA.mp3

[19:16:58][D][media_player:096]: Announcement: yes

[19:16:58][D][light:084]: 'On board light' Setting:

[19:16:58][D][light:135]: Transition length: 1.0s

[19:16:58][D][speaker_media_player:406]: State changed to ANNOUNCING

[19:16:58][D][speaker_media_player.pipeline:114]: Reading MP3 file type

[19:16:58][D][ring_buffer:034][ann_read]: Created ring buffer with size 1000000

[19:16:58][D][speaker_media_player.pipeline:124]: Decoded audio has 1 channels, 16000 Hz sample rate, and 16 bits per sample

[19:16:58][D][i2s_audio.speaker:102]: Starting

[19:16:58][D][i2s_audio.speaker:106]: Started

[19:16:58][D][ring_buffer:034][speaker_task]: Created ring buffer with size 3200

INFO Processing unexpected disconnect from ESPHome API for robot @ 192.168.178.32

WARNING Disconnected from API

INFO Successfully resolved robot @ 192.168.178.32 in 0.000s

INFO Successfully connected to robot @ 192.168.178.32 in 0.004s

INFO Successful handshake with robot @ 192.168.178.32 in 0.054s

[19:17:23][D][light:084]: 'On board light' Setting:

[19:17:23][D][light:135]: Transition length: 1.0s

[19:17:23][D][micro_wake_word:357]: Starting wake word detection

[19:17:23][D][micro_wake_word:375]: State changed from STOPPED to STARTING

[19:17:23][D][micro_wake_word:258]: Inference task has started, attempting to allocate memory for buffers

[19:17:23][D][micro_wake_word:263]: Inference task is running

[19:17:23][D][micro_wake_word:375]: State changed from STARTING to DETECTING_WAKE_WORD

[19:17:23][D][ring_buffer:034][mww]: Created ring buffer with size 3840

[19:17:23][V][i2s_audio.microphone:474]: Task started, attempting to allocate buffer

[19:17:23][V][i2s_audio.microphone:479]: Task is running and reading data

[19:17:37][D][api:144]: Accept 192.168.178.179

[19:17:37][V][api.connection:1357]: Hello from client: 'Home Assistant 2025.8.1' | 192.168.178.179 | API Version 1.12

[19:17:37][D][api.connection:1341]: Home Assistant 2025.8.1 (192.168.178.179) connected

[19:17:37][D][light:084]: 'On board light' Setting:

[19:17:37][D][light:135]: Transition length: 1.0s

[19:17:37][W][micro_wake_word:353]: Wake word detection is already running

[19:17:59][I][safe_mode:042]: Boot seems successful; resetting boot loop counter

[19:17:59][V][esp32.preferences:114]: Saving 1 items...

[19:18:00][V][esp32.preferences:126]: sync: key: 233825507, len: 4

[19:18:00][D][esp32.preferences:142]: Writing 1 items: 0 cached, 1 written, 0 failed

First make sure the youtube is very recent as Voice is changing almost every release of esphome.

My working voice using the same components uses a far more complex code.

See here for some code that works.

Will try with this, thank You so much