Voice Assistant - only response audio not playing

Hi there,

I want to turn my grandpa’s old W-48 phone into a voice assistant. In order to fit all the components into the shape of the phone, I had to configure the individual parts myself.

:no_entry_sign: My problem: Everything works, but I don’t get the voice assistant’s answers on the speaker. Just can’t hear them… :thinking:

:white_check_mark: What I got so far:

  • Voice commands are recognized correctly
  • Voice commands are implemented
  • Responses are generated and displayed in the log
  • There are links to audio files for the answers, which are displayed in the log
  • The speaker plays music when startet via Homeassistant UI (e.g. web radio).
  • The device is allowed to do service calls

Thank you very much for your help!
GPSM

Yaml:

substitutions:
  name: esphome-web-c7b550
  friendly_name: esp32-s3-telefonw48

esphome:
  name: ${name}
  friendly_name: ${friendly_name}
  name_add_mac_suffix: false
  platformio_options:
    board_build.flash_mode: dio
  project:
    name: esphome.web
    version: '1.0'

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: arduino

# Enable logging
logger:
   level: DEBUG

# Enable Home Assistant API
api:
  encryption: 
    key: SECRET

# Allow Over-The-Air updates
ota:
  platform: esphome

# Allow provisioning Wi-Fi via serial
improv_serial:

wifi:
  on_connect:
    - delay: 5s  # Gives time for improv results to be transmitted
    - ble.disable:
  on_disconnect:
    - ble.enable:
  ap:   # Set up a wifi access point

captive_portal:

dashboard_import:
  package_import_url: github://esphome/firmware/esphome-web/esp32s3.yaml@v2
  import_full_config: true

# Sets up Bluetooth LE (Only on ESP32) to allow the user
# to provision wifi credentials to the device.
esp32_improv:
  authorizer: none

# To have a "next url" for improv serial
web_server:

##########

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO17 #WS Pin from the INMP441 Microphone
    i2s_bclk_pin: GPIO16 #SCK Pin from the INMP441 Microphone

microphone:
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: mic_i2s
    channel: left
    bits_per_sample: 32bit
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO13  #SD Pin from the INMP441 Microphone

media_player:
  - platform: i2s_audio
    name: "esp_speaker"
    id: media_player_speaker
    i2s_audio_id: i2s_in
    dac_type: external
    i2s_dout_pin: GPIO09   #  DIN Pin of the MAX98357A Audio Amplifier
    mode: mono
    on_play: #Logging
      then:
        - logger.log: "Audio wird abgespielt"

voice_assistant:
  microphone: mic_i2s
  id: va
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 10.0
  use_wake_word: false
  media_player: media_player_speaker #Interner Media-Player über GPIO
  on_listening: #Turns the green LED on while listening
    - switch.turn_on:
        id: Status_LED_Green
  on_end:
    - switch.turn_off:
        id: Status_LED_Green
    - if:
        condition:
          binary_sensor.is_on: Schalter_Hoerer
        then:
          - voice_assistant.start_continuous


binary_sensor:
  - platform: gpio
    pin: 
      number: GPIO40
      inverted: True
      mode: 
        input: True
        pullup: False
    name: Schalter Hörer
    id: Schalter_Hoerer
    device_class: opening
    filters: 
      - delayed_off: 1s
    on_press: 
      then:
        - voice_assistant.start_continuous
        - switch.turn_on: Status_LED_Red
    on_release: 
      then:
        - voice_assistant.stop
        - media_player.stop: #Stoppt den Lautsprecher, um wieder hören zu können
        - switch.turn_off: Status_LED_Red

switch:
  - platform: gpio # Test-LED Grün
    name: Status_LED_Green
    id: Status_LED_Green
    pin: GPIO42
    inverted: True

  - platform: gpio # Test-LED Rot
    name: Status_LED_Red
    id: Status_LED_Red
    pin: GPIO41
    inverted: True

Logs:


[21:48:12][D][binary_sensor:036]: 'Schalter Hörer': Sending state ON
[21:48:12][D][voice_assistant:504]: State changed from IDLE to START_MICROPHONE
[21:48:12][D][voice_assistant:510]: Desired state set to START_PIPELINE
[21:48:12][D][switch:012]: 'Status_LED_Red' Turning ON.
[21:48:12][D][switch:055]: 'Status_LED_Red': Sending state ON
[21:48:12][D][voice_assistant:221]: Starting Microphone
[21:48:12][D][voice_assistant:504]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:48:12][D][voice_assistant:504]: State changed from STARTING_MICROPHONE to START_PIPELINE
[21:48:12][D][voice_assistant:275]: Requesting start...
[21:48:12][D][voice_assistant:504]: State changed from START_PIPELINE to STARTING_PIPELINE
[21:48:12][D][voice_assistant:525]: Client started, streaming microphone
[21:48:12][D][voice_assistant:504]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[21:48:12][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE
[21:48:12][D][voice_assistant:627]: Event Type: 1
[21:48:12][D][voice_assistant:630]: Assist Pipeline running
[21:48:12][D][voice_assistant:627]: Event Type: 3
[21:48:12][D][voice_assistant:641]: STT started
[21:48:12][D][switch:012]: 'Status_LED_Green' Turning ON.
[21:48:12][D][switch:055]: 'Status_LED_Green': Sending state ON
[21:48:13][D][voice_assistant:627]: Event Type: 11
[21:48:13][D][voice_assistant:781]: Starting STT by VAD
[21:48:16][D][voice_assistant:627]: Event Type: 12
[21:48:16][D][voice_assistant:785]: STT by VAD end
[21:48:16][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[21:48:16][D][voice_assistant:510]: Desired state set to AWAITING_RESPONSE
[21:48:16][D][voice_assistant:504]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:48:16][D][voice_assistant:504]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[21:48:17][D][voice_assistant:627]: Event Type: 4
[21:48:17][D][voice_assistant:655]: Speech recognised as: "Test 123."
[21:48:17][D][voice_assistant:627]: Event Type: 5
[21:48:17][D][voice_assistant:660]: Intent started
[21:48:17][D][binary_sensor:036]: 'Schalter Hörer': Sending state OFF
[21:48:17][D][media_player:061]: 'esp_speaker' - Setting
[21:48:17][D][media_player:065]:   Command: STOP
[21:48:17][D][switch:016]: 'Status_LED_Red' Turning OFF.
[21:48:17][D][switch:055]: 'Status_LED_Red': Sending state OFF
[21:48:18][D][voice_assistant:627]: Event Type: 6
[21:48:18][D][voice_assistant:627]: Event Type: 7
[21:48:18][D][voice_assistant:683]: Response: "Test erfolgreich. Wie kann ich helfen?"
[21:48:18][D][voice_assistant:627]: Event Type: 8
[21:48:18][D][voice_assistant:703]: Response URL: "http://192.168.178.40:8123/api/tts_proxy/58d00e49f0293161477a8e7ec9e514f7b36eae50_de-de_1c68e49545_tts.microsoft.mp3"
[21:48:18][D][voice_assistant:504]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[21:48:18][D][voice_assistant:510]: Desired state set to STREAMING_RESPONSE
[21:48:18][D][media_player:061]: 'esp_speaker' - Setting
[21:48:18][D][media_player:068]:   Media URL: http://192.168.178.40:8123/api/tts_proxy/58d00e49f0293161477a8e7ec9e514f7b36eae50_de-de_1c68e49545_tts.microsoft.mp3
[21:48:18][D][media_player:074]:  Announcement: yes
[21:48:18][D][voice_assistant:627]: Event Type: 2
[21:48:18][D][voice_assistant:717]: Assist Pipeline ended
[21:48:19][W][component:237]: Component i2s_audio.media_player took a long time for an operation (531 ms).
[21:48:19][W][component:238]: Components should block for at most 30 ms.