Cloud Voice assistant, no sound

hello,

I’m trying to put together a voice assistant based on what JLo did in the Voice Assistant Contest Launch Video here: https://www.youtube.com/watch?v=99lGuB4J-4o&t=4912s, but I get no sound.

My home assistant is version 2024.2.1, running on an ethernet cable connected Pi 4.
I am running OPNSense, a managed switch, two access points and no crazy setup on my local LAN. no Vlans or anything special like that.

My yaml:

esphome:
  name: virtual-assistant
  friendly_name: Virtual Assistant

esp32:
  board: esp32dev
  framework:
    type: esp-idf
    

esp_adf:
external_components:
  - source: github://pr#5230
    components:
      - esp_adf
    refresh: 0s 

# Enable logging
logger:
  level: DEBUG

# Enable Home Assistant API
api:
  encryption:
    key: "wkA+7LJ1XxlXA4KG/cc8hcvZQa0BZfvJwQC6SKassm8="

ota:
  password: "902ba392051b3c21b93344a940631668"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Virtual-Assistant"
    password: "ZVtsrQXRIjry"

captive_portal:

switch:
  - platform: gpio
    pin:
      number: 22 
      mode: output
    id: led

interval:
  - interval: 300s
    then:
      - switch.toggle: led
      

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO25 #ws blue
    i2s_bclk_pin: GPIO19  # bclk green
  # - id: i2s_out
  #   i2s_lrclk_pin: GPIO26 # Green
  #   i2s_bclk_pin: GPIO27  # blue

microphone:
  - platform: i2s_audio
    id: external_microphone
    adc_type: external
    i2s_din_pin: GPIO32
    i2s_audio_id: i2s_in
    pdm: false
    bits_per_sample: 32bit
    channel: right

speaker:
  - platform: i2s_audio
    id: external_speaker
    dac_type: external
    i2s_dout_pin: GPIO27
    i2s_audio_id: i2s_in
    mode: mono

voice_assistant:
  id: va
  microphone: external_microphone
  speaker: external_speaker
  use_wake_word: true
  noise_suppression_level: 4
  auto_gain: 31dBFS
  volume_multiplier: 8.0
  on_client_connected:
    - voice_assistant.start_continuous: 

I am using home assistant cloud. My IMNP441 Mic works, I get can turn on and off my test bulb, and the logs when I connect to my ESP32 tell me that I’m receiving a verbal response from the Cloud assistant, but I get nothing to my speakers. I can see that the logs turn the speakers on.

I have:

  • verified all of my connections a dozen times.
  • swapped esp32s
  • swapped 3 different MAX 98357As
  • tested with 2 different speakers, both of which are verified good
  • verified that Dout, Bclk & Lrc all go electrically high when the cloud responds

My setup is correct, but the response wav file from the cloud is only making it to my Raspberry Pi, and not out to the ESP32.

Any thoughts are greatly appreciated.

a copy of my ESP32 log, after successfully turning on the test bulb:

INFO ESPHome 2023.12.9
INFO Reading configuration /config/esphome/virtual-assistant.yaml...
INFO Updating https://github.com/esphome/esphome.git@pull/5230/head
INFO Starting log output from virtual-assistant.local using esphome API
INFO Successfully connected to virtual-assistant @ 192.168.1.154 in 0.163s
INFO Successful handshake with virtual-assistant @ 192.168.1.154 in 0.164s
[21:30:41][I][app:102]: ESPHome version 2023.12.9 compiled on Feb 15 2024, 20:18:49
[21:30:41][C][wifi:573]: WiFi:
[21:30:41][C][wifi:405]:   Local MAC: 7C:9E:BD:06:62:8C
[21:30:41][C][wifi:410]:   SSID: [redacted]
[21:30:41][C][wifi:411]:   IP Address: 192.168.1.154
[21:30:41][C][wifi:413]:   BSSID: [redacted]
[21:30:41][C][wifi:414]:   Hostname: 'virtual-assistant'
[21:30:41][C][wifi:416]:   Signal strength: -69 dB ▂▄▆█
[21:30:41][C][wifi:420]:   Channel: 11
[21:30:41][C][wifi:421]:   Subnet: 255.255.255.0
[21:30:41][C][wifi:422]:   Gateway: 192.168.1.1
[21:30:41][C][wifi:423]:   DNS1: 192.168.1.1
[21:30:41][C][wifi:424]:   DNS2: 0.0.0.0
[21:30:41][C][logger:439]: Logger:
[21:30:41][C][logger:440]:   Level: DEBUG
[21:30:41][C][logger:441]:   Log Baud Rate: 115200
[21:30:41][C][logger:443]:   Hardware UART: UART0
[21:30:41][C][switch.gpio:068]: GPIO Switch 'led'
[21:30:41][C][switch.gpio:091]:   Restore Mode: always OFF
[21:30:41][C][switch.gpio:031]:   Pin: GPIO22
[21:30:41][C][captive_portal:088]: Captive Portal:
[21:30:41][C][mdns:115]: mDNS:
[21:30:41][C][mdns:116]:   Hostname: virtual-assistant
[21:30:41][C][ota:097]: Over-The-Air Updates:
[21:30:41][C][ota:098]:   Address: virtual-assistant.local:3232
[21:30:41][C][ota:101]:   Using Password.
[21:30:41][C][api:139]: API Server:
[21:30:41][C][api:140]:   Address: virtual-assistant.local:6053
[21:30:41][C][api:142]:   Using noise encryption: YES
[21:30:42][D][voice_assistant:519]: Event Type: 0
[21:30:42][D][voice_assistant:519]: Event Type: 2
[21:30:42][D][voice_assistant:609]: Assist Pipeline ended
[21:30:42][D][voice_assistant:412]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[21:30:42][D][voice_assistant:418]: Desired state set to WAITING_FOR_VAD
[21:30:42][D][voice_assistant:170]: Waiting for speech...
[21:30:42][D][voice_assistant:412]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[21:30:42][D][voice_assistant:183]: VAD detected speech
[21:30:42][D][voice_assistant:412]: State changed from WAITING_FOR_VAD to START_PIPELINE
[21:30:42][D][voice_assistant:418]: Desired state set to STREAMING_MICROPHONE
[21:30:42][D][voice_assistant:200]: Requesting start...
[21:30:42][D][voice_assistant:412]: State changed from START_PIPELINE to STARTING_PIPELINE
[21:30:42][D][voice_assistant:433]: Client started, streaming microphone
[21:30:42][D][voice_assistant:412]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[21:30:42][D][voice_assistant:418]: Desired state set to STREAMING_MICROPHONE
[21:30:42][D][voice_assistant:519]: Event Type: 1
[21:30:42][D][voice_assistant:522]: Assist Pipeline running
[21:30:42][D][voice_assistant:519]: Event Type: 9
[21:30:44][D][voice_assistant:519]: Event Type: 10
[21:30:44][D][voice_assistant:528]: Wake word detected
[21:30:44][D][voice_assistant:519]: Event Type: 3
[21:30:44][D][voice_assistant:533]: STT started
[21:30:45][D][voice_assistant:519]: Event Type: 11
[21:30:45][D][voice_assistant:670]: Starting STT by VAD
[21:30:47][D][voice_assistant:519]: Event Type: 12
[21:30:47][D][voice_assistant:674]: STT by VAD end
[21:30:47][D][voice_assistant:412]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[21:30:47][D][voice_assistant:418]: Desired state set to AWAITING_RESPONSE
[21:30:47][D][voice_assistant:412]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:30:47][D][esp-idf:000]: I (4104329) I2S: DMA queue destroyed

[21:30:47][D][voice_assistant:412]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[21:30:47][D][voice_assistant:519]: Event Type: 4
[21:30:47][D][voice_assistant:547]: Speech recognised as: "Turn on test bulb."
[21:30:47][D][voice_assistant:519]: Event Type: 5
[21:30:47][D][voice_assistant:552]: Intent started
[21:30:47][D][voice_assistant:519]: Event Type: 6
[21:30:47][D][voice_assistant:519]: Event Type: 7
[21:30:47][D][voice_assistant:575]: Response: "Turned on the light"
[21:30:47][D][voice_assistant:519]: Event Type: 8
[21:30:47][D][voice_assistant:595]: Response URL: "http://192.168.1.103:8123/api/tts_proxy/104c89b5f9053e4751d03002aab527c96124bd77_en-us_03ed9f9845_tts.home_assistant_cloud.wav"
[21:30:47][D][voice_assistant:412]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[21:30:47][D][voice_assistant:418]: Desired state set to STREAMING_RESPONSE
[21:30:47][D][esp-idf:000]: I (4104485) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[21:30:47][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[21:30:49][D][voice_assistant:519]: Event Type: 99
[21:30:49][D][voice_assistant:665]: TTS stream end
[21:30:49][D][voice_assistant:283]: End of audio stream received
[21:30:49][D][voice_assistant:412]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[21:30:49][D][voice_assistant:418]: Desired state set to RESPONSE_FINISHED
[21:30:49][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[21:30:49][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[21:30:49][D][voice_assistant:315]: Speaker has finished outputting all audio
[21:30:49][D][voice_assistant:412]: State changed from RESPONSE_FINISHED to IDLE
[21:30:49][D][voice_assistant:418]: Desired state set to IDLE
[21:30:49][D][voice_assistant:412]: State changed from IDLE to START_MICROPHONE
[21:30:49][D][voice_assistant:418]: Desired state set to WAIT_FOR_VAD
[21:30:49][D][voice_assistant:153]: Starting Microphone
[21:30:49][D][voice_assistant:412]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:30:49][D][esp-idf:000]: I (4106649) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4

Change the dout pin for the speaker. Mine is set to 12, but different boards work slightly differently. You may need to experiment.