I’ve spent few hours watching YouTube videos, reading articles, assembling my first prototype, and now testing it. About 10 mins ago, I got my first response from Nabu! I can’t get a repeat however.
Here is my hardware:
ESP WROOM 32
INMP 441
MAX 98357A
4 ohm speaker
24 LED light ring
Everything is mounted on a circuit board for testing.
Here is my code
esphome:
name: esphome-voice-main-bedroom
friendly_name: Voice main bedroom
on_boot:
- priority: -100
then:
- wait_until: api.connected
- delay: 1s
- if:
condition:
switch.is_on: use_wake_word
then:
- voice_assistant.start_continuous:
esp32:
board: esp32dev
framework:
type: esp-idf
version: recommended
# Enable logging
logger:
# Enable Home Assistant API
api:
ota:
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
manual_ip:
static_ip: 192.168.0.220
gateway: 192.168.0.1
subnet: 255.255.255.0
# Enable fallback hotspot (captive portal) in case wifi connection fails
ap:
ssid: "Esphome-voice-main-bedroom"
password: "" # VGiTqrtnA52n
i2s_audio:
i2s_lrclk_pin: GPIO27
i2s_bclk_pin: GPIO22
microphone:
- platform: i2s_audio
id: mic_i2s
adc_type: external
i2s_din_pin: GPIO21
pdm: false
speaker:
- platform: i2s_audio
id: speaker_i2s
dac_type: external
i2s_dout_pin: GPIO18
mode: mono
voice_assistant:
microphone: mic_i2s
speaker: speaker_i2s
use_wake_word: true
noise_suppression_level: 4
auto_gain: 15dBFS
volume_multiplier: 1.0
id: assist
on_end:
- light.turn_off:
id: led_ring
on_wake_word_detected:
- light.addressable_set:
id: led_ring
range_from: 16
range_to: 17
red: 0%
green: 0%
blue: 100%
- delay: 0.03s
- light.addressable_set:
id: led_ring
range_from: 15
range_to: 18
red: 0%
green: 0%
blue: 100%
- delay: 0.03s
- light.addressable_set:
id: led_ring
range_from: 14
range_to: 19
red: 0%
green: 0%
blue: 100%
- delay: 0.03s
- light.addressable_set:
id: led_ring
range_from: 13
range_to: 20
red: 0%
green: 0%
blue: 100%
- delay: 0.03s
- light.addressable_set:
id: led_ring
range_from: 12
range_to: 21
red: 0%
green: 0%
blue: 100%
- delay: 0.03s
- light.addressable_set:
id: led_ring
range_from: 11
range_to: 22
red: 0%
green: 0%
blue: 100%
- delay: 0.03s
- light.addressable_set:
id: led_ring
range_from: 10
range_to: 23
red: 0%
green: 0%
blue: 100%
- delay: 0.03s
- light.addressable_set:
id: led_ring
range_from: 9
range_to: 1
red: 0%
green: 0%
blue: 100%
- delay: 0.03s
- light.addressable_set:
id: led_ring
range_from: 9
range_to: 24
red: 0%
green: 0%
blue: 100%
- delay: 0.03s
- light.addressable_set:
id: led_ring
range_from: 7
range_to: 8
red: 0%
green: 50%
blue: 50%
- light.addressable_set:
id: led_ring
range_from: 0
range_to: 1
red: 0%
green: 50%
blue: 50%
- delay: 0.1s
- light.addressable_set:
id: led_ring
range_from: 2
range_to: 6
red: 0%
green: 100%
blue: 00%
switch:
- platform: template
name: Use wake word
id: use_wake_word
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
entity_category: config
on_turn_on:
- lambda: id(assist).set_use_wake_word(true);
- if:
condition:
not:
- voice_assistant.is_running
then:
- voice_assistant.start_continuous
on_turn_off:
- voice_assistant.stop
- lambda: id(assist).set_use_wake_word(false);
light:
- platform: esp32_rmt_led_strip
rgb_order: GRB
pin: GPIO23
num_leds: 24
rmt_channel: 0
chipset: ws2812
name: "led_ring"
id: led_ring
captive_portal:
Here are the logs from around the time of that successful test:
[08:12:31][D][voice_assistant:200]: Requesting start...
[08:12:31][D][voice_assistant:412]: State changed from START_PIPELINE to STARTING_PIPELINE
[08:12:31][D][voice_assistant:433]: Client started, streaming microphone
[08:12:31][D][voice_assistant:412]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[08:12:31][D][voice_assistant:418]: Desired state set to STREAMING_MICROPHONE
[08:12:31][D][voice_assistant:519]: Event Type: 1
[08:12:31][D][voice_assistant:522]: Assist Pipeline running
[08:12:31][D][voice_assistant:519]: Event Type: 9
[08:12:37][D][voice_assistant:519]: Event Type: 0
[08:12:37][D][voice_assistant:519]: Event Type: 2
[08:12:37][D][voice_assistant:609]: Assist Pipeline ended
[08:12:37][D][voice_assistant:412]: State changed from STREAMING_MICROPHONE to IDLE
[08:12:37][D][voice_assistant:418]: Desired state set to IDLE
[08:12:37][D][voice_assistant:412]: State changed from IDLE to START_PIPELINE
[08:12:37][D][voice_assistant:418]: Desired state set to START_MICROPHONE
[08:12:37][D][light:036]: 'led_ring' Setting:
[08:12:37][D][light:085]: Transition length: 1.0s
[08:12:37][D][voice_assistant:200]: Requesting start...
[08:12:37][D][voice_assistant:412]: State changed from START_PIPELINE to STARTING_PIPELINE
[08:12:37][D][voice_assistant:433]: Client started, streaming microphone
[08:12:37][D][voice_assistant:412]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[08:12:37][D][voice_assistant:418]: Desired state set to STREAMING_MICROPHONE
[08:12:37][D][voice_assistant:519]: Event Type: 1
[08:12:37][D][voice_assistant:522]: Assist Pipeline running
[08:12:37][D][voice_assistant:519]: Event Type: 9
[08:12:40][D][voice_assistant:519]: Event Type: 10
[08:12:40][D][voice_assistant:528]: Wake word detected
[08:12:40][D][voice_assistant:519]: Event Type: 3
[08:12:40][D][voice_assistant:533]: STT started
[08:12:41][D][voice_assistant:519]: Event Type: 11
[08:12:41][D][voice_assistant:670]: Starting STT by VAD
[08:12:41][D][voice_assistant:519]: Event Type: 12
[08:12:41][D][voice_assistant:674]: STT by VAD end
[08:12:41][D][voice_assistant:412]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[08:12:41][D][voice_assistant:418]: Desired state set to AWAITING_RESPONSE
[08:12:41][D][voice_assistant:412]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[08:12:41][D][esp-idf:000]: I (37184) I2S: DMA queue destroyed
[08:12:41][D][voice_assistant:412]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[08:12:42][D][voice_assistant:519]: Event Type: 4
[08:12:42][D][voice_assistant:547]: Speech recognised as: "Look like."
[08:12:42][D][voice_assistant:519]: Event Type: 5
[08:12:42][D][voice_assistant:552]: Intent started
[08:12:42][D][voice_assistant:519]: Event Type: 6
[08:12:42][D][voice_assistant:519]: Event Type: 7
[08:12:42][D][voice_assistant:575]: Response: "Sorry, I couldn't understand that"
[08:12:42][D][voice_assistant:519]: Event Type: 8
[08:12:42][D][voice_assistant:595]: Response URL: "http://192.168.0.124:8123/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-ca_dba8942832_cloud.wav"
[08:12:42][D][voice_assistant:412]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[08:12:42][D][voice_assistant:418]: Desired state set to STREAMING_RESPONSE
[08:12:42][D][esp-idf:000]: I (38181) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[08:12:42][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:12:43][D][esp-idf:000]: I (38288) I2S: DMA queue destroyed
[08:12:43][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:12:43][D][voice_assistant:519]: Event Type: 98
[08:12:43][D][voice_assistant:657]: TTS stream start
[08:12:43][D][esp-idf:000]: I (38734) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[08:12:43][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:12:43][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[08:12:43][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[08:12:43][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[08:12:46][D][voice_assistant:519]: Event Type: 99
[08:12:46][D][voice_assistant:665]: TTS stream end
[08:12:46][D][voice_assistant:283]: End of audio stream received
[08:12:46][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[08:12:46][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[08:12:46][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[08:12:46][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[08:12:46][D][voice_assistant:347]: Speaker buffer full, trying again next loop
[08:13:53][D][voice_assistant:315]: Speaker has finished outputting all audio
[08:13:53][D][voice_assistant:412]: State changed from RESPONSE_FINISHED to IDLE
[08:13:53][D][voice_assistant:418]: Desired state set to IDLE
[08:13:53][D][voice_assistant:412]: State changed from IDLE to START_PIPELINE
[08:13:53][D][voice_assistant:418]: Desired state set to START_MICROPHONE
[08:13:53][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:13:53][D][voice_assistant:118]: microphone not running
[08:13:53][D][voice_assistant:200]: Requesting start...
[08:13:53][D][voice_assistant:412]: State changed from START_PIPELINE to STARTING_PIPELINE
[08:13:53][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:13:53][D][voice_assistant:118]: microphone not running
[08:13:53][D][voice_assistant:118]: microphone not running
[08:13:53][D][voice_assistant:118]: microphone not running
[08:13:53][D][voice_assistant:118]: microphone not running
[08:13:53][D][voice_assistant:118]: microphone not running
[08:13:53][D][voice_assistant:433]: Client started, streaming microphone
[08:13:53][D][voice_assistant:412]: State changed from STARTING_PIPELINE to START_MICROPHONE
[08:13:53][D][voice_assistant:418]: Desired state set to STREAMING_MICROPHONE
[08:13:53][D][voice_assistant:153]: Starting Microphone
[08:13:53][D][voice_assistant:412]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[08:13:53][D][esp-idf:000]: I (108844) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4
[08:13:53][D][voice_assistant:519]: Event Type: 1
[08:13:53][D][voice_assistant:522]: Assist Pipeline running
[08:13:53][D][voice_assistant:412]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[08:13:53][D][voice_assistant:519]: Event Type: 9
[08:14:53][D][voice_assistant:519]: Event Type: 0
[08:14:53][D][voice_assistant:519]: Event Type: 2
[08:14:53][D][voice_assistant:609]: Assist Pipeline ended
[08:14:53][D][voice_assistant:412]: State changed from STREAMING_MICROPHONE to IDLE
[08:14:53][D][voice_assistant:418]: Desired state set to IDLE
[08:14:53][D][voice_assistant:412]: State changed from IDLE to START_PIPELINE
[08:14:53][D][voice_assistant:418]: Desired state set to START_MICROPHONE
[08:14:53][D][light:036]: 'led_ring' Setting:
[08:14:53][D][light:085]: Transition length: 1.0s
Now I can’t get another response. I’m trying the wake word every few seconds. What am I missing?