Voice Assistant - only response audio not playing

GpsM2 · June 28, 2024, 7:56pm

Hi there,

I want to turn my grandpa’s old W-48 phone into a voice assistant. In order to fit all the components into the shape of the phone, I had to configure the individual parts myself.

My problem: Everything works, but I don’t get the voice assistant’s answers on the speaker. Just can’t hear them…

What I got so far:

Voice commands are recognized correctly
Voice commands are implemented
Responses are generated and displayed in the log
There are links to audio files for the answers, which are displayed in the log
The speaker plays music when startet via Homeassistant UI (e.g. web radio).
The device is allowed to do service calls

Thank you very much for your help!
GPSM

Yaml:

substitutions:
  name: esphome-web-c7b550
  friendly_name: esp32-s3-telefonw48

esphome:
  name: ${name}
  friendly_name: ${friendly_name}
  name_add_mac_suffix: false
  platformio_options:
    board_build.flash_mode: dio
  project:
    name: esphome.web
    version: '1.0'

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: arduino

# Enable logging
logger:
   level: DEBUG

# Enable Home Assistant API
api:
  encryption: 
    key: SECRET

# Allow Over-The-Air updates
ota:
  platform: esphome

# Allow provisioning Wi-Fi via serial
improv_serial:

wifi:
  on_connect:
    - delay: 5s  # Gives time for improv results to be transmitted
    - ble.disable:
  on_disconnect:
    - ble.enable:
  ap:   # Set up a wifi access point

captive_portal:

dashboard_import:
  package_import_url: github://esphome/firmware/esphome-web/esp32s3.yaml@v2
  import_full_config: true

# Sets up Bluetooth LE (Only on ESP32) to allow the user
# to provision wifi credentials to the device.
esp32_improv:
  authorizer: none

# To have a "next url" for improv serial
web_server:

##########

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO17 #WS Pin from the INMP441 Microphone
    i2s_bclk_pin: GPIO16 #SCK Pin from the INMP441 Microphone

microphone:
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: mic_i2s
    channel: left
    bits_per_sample: 32bit
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO13  #SD Pin from the INMP441 Microphone

media_player:
  - platform: i2s_audio
    name: "esp_speaker"
    id: media_player_speaker
    i2s_audio_id: i2s_in
    dac_type: external
    i2s_dout_pin: GPIO09   #  DIN Pin of the MAX98357A Audio Amplifier
    mode: mono
    on_play: #Logging
      then:
        - logger.log: "Audio wird abgespielt"

voice_assistant:
  microphone: mic_i2s
  id: va
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 10.0
  use_wake_word: false
  media_player: media_player_speaker #Interner Media-Player über GPIO
  on_listening: #Turns the green LED on while listening
    - switch.turn_on:
        id: Status_LED_Green
  on_end:
    - switch.turn_off:
        id: Status_LED_Green
    - if:
        condition:
          binary_sensor.is_on: Schalter_Hoerer
        then:
          - voice_assistant.start_continuous


binary_sensor:
  - platform: gpio
    pin: 
      number: GPIO40
      inverted: True
      mode: 
        input: True
        pullup: False
    name: Schalter Hörer
    id: Schalter_Hoerer
    device_class: opening
    filters: 
      - delayed_off: 1s
    on_press: 
      then:
        - voice_assistant.start_continuous
        - switch.turn_on: Status_LED_Red
    on_release: 
      then:
        - voice_assistant.stop
        - media_player.stop: #Stoppt den Lautsprecher, um wieder hören zu können
        - switch.turn_off: Status_LED_Red

switch:
  - platform: gpio # Test-LED Grün
    name: Status_LED_Green
    id: Status_LED_Green
    pin: GPIO42
    inverted: True

  - platform: gpio # Test-LED Rot
    name: Status_LED_Red
    id: Status_LED_Red
    pin: GPIO41
    inverted: True

Logs:


[21:48:12][D][binary_sensor:036]: 'Schalter Hörer': Sending state ON
[21:48:12][D][voice_assistant:504]: State changed from IDLE to START_MICROPHONE
[21:48:12][D][voice_assistant:510]: Desired state set to START_PIPELINE
[21:48:12][D][switch:012]: 'Status_LED_Red' Turning ON.
[21:48:12][D][switch:055]: 'Status_LED_Red': Sending state ON
[21:48:12][D][voice_assistant:221]: Starting Microphone
[21:48:12][D][voice_assistant:504]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:48:12][D][voice_assistant:504]: State changed from STARTING_MICROPHONE to START_PIPELINE
[21:48:12][D][voice_assistant:275]: Requesting start...
[21:48:12][D][voice_assistant:504]: State changed from START_PIPELINE to STARTING_PIPELINE
[21:48:12][D][voice_assistant:525]: Client started, streaming microphone
[21:48:12][D][voice_assistant:504]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[21:48:12][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE
[21:48:12][D][voice_assistant:627]: Event Type: 1
[21:48:12][D][voice_assistant:630]: Assist Pipeline running
[21:48:12][D][voice_assistant:627]: Event Type: 3
[21:48:12][D][voice_assistant:641]: STT started
[21:48:12][D][switch:012]: 'Status_LED_Green' Turning ON.
[21:48:12][D][switch:055]: 'Status_LED_Green': Sending state ON
[21:48:13][D][voice_assistant:627]: Event Type: 11
[21:48:13][D][voice_assistant:781]: Starting STT by VAD
[21:48:16][D][voice_assistant:627]: Event Type: 12
[21:48:16][D][voice_assistant:785]: STT by VAD end
[21:48:16][D][voice_assistant:504]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[21:48:16][D][voice_assistant:510]: Desired state set to AWAITING_RESPONSE
[21:48:16][D][voice_assistant:504]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:48:16][D][voice_assistant:504]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[21:48:17][D][voice_assistant:627]: Event Type: 4
[21:48:17][D][voice_assistant:655]: Speech recognised as: "Test 123."
[21:48:17][D][voice_assistant:627]: Event Type: 5
[21:48:17][D][voice_assistant:660]: Intent started
[21:48:17][D][binary_sensor:036]: 'Schalter Hörer': Sending state OFF
[21:48:17][D][media_player:061]: 'esp_speaker' - Setting
[21:48:17][D][media_player:065]:   Command: STOP
[21:48:17][D][switch:016]: 'Status_LED_Red' Turning OFF.
[21:48:17][D][switch:055]: 'Status_LED_Red': Sending state OFF
[21:48:18][D][voice_assistant:627]: Event Type: 6
[21:48:18][D][voice_assistant:627]: Event Type: 7
[21:48:18][D][voice_assistant:683]: Response: "Test erfolgreich. Wie kann ich helfen?"
[21:48:18][D][voice_assistant:627]: Event Type: 8
[21:48:18][D][voice_assistant:703]: Response URL: "http://192.168.178.40:8123/api/tts_proxy/58d00e49f0293161477a8e7ec9e514f7b36eae50_de-de_1c68e49545_tts.microsoft.mp3"
[21:48:18][D][voice_assistant:504]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[21:48:18][D][voice_assistant:510]: Desired state set to STREAMING_RESPONSE
[21:48:18][D][media_player:061]: 'esp_speaker' - Setting
[21:48:18][D][media_player:068]:   Media URL: http://192.168.178.40:8123/api/tts_proxy/58d00e49f0293161477a8e7ec9e514f7b36eae50_de-de_1c68e49545_tts.microsoft.mp3
[21:48:18][D][media_player:074]:  Announcement: yes
[21:48:18][D][voice_assistant:627]: Event Type: 2
[21:48:18][D][voice_assistant:717]: Assist Pipeline ended
[21:48:19][W][component:237]: Component i2s_audio.media_player took a long time for an operation (531 ms).
[21:48:19][W][component:238]: Components should block for at most 30 ms.