hello,
I’m trying to put together a voice assistant based on what JLo did in the Voice Assistant Contest Launch Video here: https://www.youtube.com/watch?v=99lGuB4J-4o&t=4912s, but I get no sound.
My home assistant is version 2024.2.1, running on an ethernet cable connected Pi 4.
I am running OPNSense, a managed switch, two access points and no crazy setup on my local LAN. no Vlans or anything special like that.
My yaml:
esphome:
name: virtual-assistant
friendly_name: Virtual Assistant
esp32:
board: esp32dev
framework:
type: esp-idf
esp_adf:
external_components:
- source: github://pr#5230
components:
- esp_adf
refresh: 0s
# Enable logging
logger:
level: DEBUG
# Enable Home Assistant API
api:
encryption:
key: "wkA+7LJ1XxlXA4KG/cc8hcvZQa0BZfvJwQC6SKassm8="
ota:
password: "902ba392051b3c21b93344a940631668"
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
# Enable fallback hotspot (captive portal) in case wifi connection fails
ap:
ssid: "Virtual-Assistant"
password: "ZVtsrQXRIjry"
captive_portal:
switch:
- platform: gpio
pin:
number: 22
mode: output
id: led
interval:
- interval: 300s
then:
- switch.toggle: led
i2s_audio:
- id: i2s_in
i2s_lrclk_pin: GPIO25 #ws blue
i2s_bclk_pin: GPIO19 # bclk green
# - id: i2s_out
# i2s_lrclk_pin: GPIO26 # Green
# i2s_bclk_pin: GPIO27 # blue
microphone:
- platform: i2s_audio
id: external_microphone
adc_type: external
i2s_din_pin: GPIO32
i2s_audio_id: i2s_in
pdm: false
bits_per_sample: 32bit
channel: right
speaker:
- platform: i2s_audio
id: external_speaker
dac_type: external
i2s_dout_pin: GPIO27
i2s_audio_id: i2s_in
mode: mono
voice_assistant:
id: va
microphone: external_microphone
speaker: external_speaker
use_wake_word: true
noise_suppression_level: 4
auto_gain: 31dBFS
volume_multiplier: 8.0
on_client_connected:
- voice_assistant.start_continuous:
I am using home assistant cloud. My IMNP441 Mic works, I get can turn on and off my test bulb, and the logs when I connect to my ESP32 tell me that I’m receiving a verbal response from the Cloud assistant, but I get nothing to my speakers. I can see that the logs turn the speakers on.
I have:
- verified all of my connections a dozen times.
- swapped esp32s
- swapped 3 different MAX 98357As
- tested with 2 different speakers, both of which are verified good
- verified that Dout, Bclk & Lrc all go electrically high when the cloud responds
My setup is correct, but the response wav file from the cloud is only making it to my Raspberry Pi, and not out to the ESP32.
Any thoughts are greatly appreciated.
a copy of my ESP32 log, after successfully turning on the test bulb:
INFO ESPHome 2023.12.9
INFO Reading configuration /config/esphome/virtual-assistant.yaml...
INFO Updating https://github.com/esphome/esphome.git@pull/5230/head
INFO Starting log output from virtual-assistant.local using esphome API
INFO Successfully connected to virtual-assistant @ 192.168.1.154 in 0.163s
INFO Successful handshake with virtual-assistant @ 192.168.1.154 in 0.164s
[21:30:41][I][app:102]: ESPHome version 2023.12.9 compiled on Feb 15 2024, 20:18:49
[21:30:41][C][wifi:573]: WiFi:
[21:30:41][C][wifi:405]: Local MAC: 7C:9E:BD:06:62:8C
[21:30:41][C][wifi:410]: SSID: [redacted]
[21:30:41][C][wifi:411]: IP Address: 192.168.1.154
[21:30:41][C][wifi:413]: BSSID: [redacted]
[21:30:41][C][wifi:414]: Hostname: 'virtual-assistant'
[21:30:41][C][wifi:416]: Signal strength: -69 dB ▂▄▆█
[21:30:41][C][wifi:420]: Channel: 11
[21:30:41][C][wifi:421]: Subnet: 255.255.255.0
[21:30:41][C][wifi:422]: Gateway: 192.168.1.1
[21:30:41][C][wifi:423]: DNS1: 192.168.1.1
[21:30:41][C][wifi:424]: DNS2: 0.0.0.0
[21:30:41][C][logger:439]: Logger:
[21:30:41][C][logger:440]: Level: DEBUG
[21:30:41][C][logger:441]: Log Baud Rate: 115200
[21:30:41][C][logger:443]: Hardware UART: UART0
[21:30:41][C][switch.gpio:068]: GPIO Switch 'led'
[21:30:41][C][switch.gpio:091]: Restore Mode: always OFF
[21:30:41][C][switch.gpio:031]: Pin: GPIO22
[21:30:41][C][captive_portal:088]: Captive Portal:
[21:30:41][C][mdns:115]: mDNS:
[21:30:41][C][mdns:116]: Hostname: virtual-assistant
[21:30:41][C][ota:097]: Over-The-Air Updates:
[21:30:41][C][ota:098]: Address: virtual-assistant.local:3232
[21:30:41][C][ota:101]: Using Password.
[21:30:41][C][api:139]: API Server:
[21:30:41][C][api:140]: Address: virtual-assistant.local:6053
[21:30:41][C][api:142]: Using noise encryption: YES
[21:30:42][D][voice_assistant:519]: Event Type: 0
[21:30:42][D][voice_assistant:519]: Event Type: 2
[21:30:42][D][voice_assistant:609]: Assist Pipeline ended
[21:30:42][D][voice_assistant:412]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[21:30:42][D][voice_assistant:418]: Desired state set to WAITING_FOR_VAD
[21:30:42][D][voice_assistant:170]: Waiting for speech...
[21:30:42][D][voice_assistant:412]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[21:30:42][D][voice_assistant:183]: VAD detected speech
[21:30:42][D][voice_assistant:412]: State changed from WAITING_FOR_VAD to START_PIPELINE
[21:30:42][D][voice_assistant:418]: Desired state set to STREAMING_MICROPHONE
[21:30:42][D][voice_assistant:200]: Requesting start...
[21:30:42][D][voice_assistant:412]: State changed from START_PIPELINE to STARTING_PIPELINE
[21:30:42][D][voice_assistant:433]: Client started, streaming microphone
[21:30:42][D][voice_assistant:412]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[21:30:42][D][voice_assistant:418]: Desired state set to STREAMING_MICROPHONE
[21:30:42][D][voice_assistant:519]: Event Type: 1
[21:30:42][D][voice_assistant:522]: Assist Pipeline running
[21:30:42][D][voice_assistant:519]: Event Type: 9
[21:30:44][D][voice_assistant:519]: Event Type: 10
[21:30:44][D][voice_assistant:528]: Wake word detected
[21:30:44][D][voice_assistant:519]: Event Type: 3
[21:30:44][D][voice_assistant:533]: STT started
[21:30:45][D][voice_assistant:519]: Event Type: 11
[21:30:45][D][voice_assistant:670]: Starting STT by VAD
[21:30:47][D][voice_assistant:519]: Event Type: 12
[21:30:47][D][voice_assistant:674]: STT by VAD end
[21:30:47][D][voice_assistant:412]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[21:30:47][D][voice_assistant:418]: Desired state set to AWAITING_RESPONSE
[21:30:47][D][voice_assistant:412]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:30:47][D][esp-idf:000]: I (4104329) I2S: DMA queue destroyed
[21:30:47][D][voice_assistant:412]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[21:30:47][D][voice_assistant:519]: Event Type: 4
[21:30:47][D][voice_assistant:547]: Speech recognised as: "Turn on test bulb."
[21:30:47][D][voice_assistant:519]: Event Type: 5
[21:30:47][D][voice_assistant:552]: Intent started
[21:30:47][D][voice_assistant:519]: Event Type: 6
[21:30:47][D][voice_assistant:519]: Event Type: 7
[21:30:47][D][voice_assistant:575]: Response: "Turned on the light"
[21:30:47][D][voice_assistant:519]: Event Type: 8
[21:30:47][D][voice_assistant:595]: Response URL: "http://192.168.1.103:8123/api/tts_proxy/104c89b5f9053e4751d03002aab527c96124bd77_en-us_03ed9f9845_tts.home_assistant_cloud.wav"
[21:30:47][D][voice_assistant:412]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[21:30:47][D][voice_assistant:418]: Desired state set to STREAMING_RESPONSE
[21:30:47][D][esp-idf:000]: I (4104485) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[21:30:47][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[21:30:49][D][voice_assistant:519]: Event Type: 99
[21:30:49][D][voice_assistant:665]: TTS stream end
[21:30:49][D][voice_assistant:283]: End of audio stream received
[21:30:49][D][voice_assistant:412]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[21:30:49][D][voice_assistant:418]: Desired state set to RESPONSE_FINISHED
[21:30:49][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[21:30:49][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[21:30:49][D][voice_assistant:315]: Speaker has finished outputting all audio
[21:30:49][D][voice_assistant:412]: State changed from RESPONSE_FINISHED to IDLE
[21:30:49][D][voice_assistant:418]: Desired state set to IDLE
[21:30:49][D][voice_assistant:412]: State changed from IDLE to START_MICROPHONE
[21:30:49][D][voice_assistant:418]: Desired state set to WAIT_FOR_VAD
[21:30:49][D][voice_assistant:153]: Starting Microphone
[21:30:49][D][voice_assistant:412]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:30:49][D][esp-idf:000]: I (4106649) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4