- You can use better model for speech recognition. It will require better CPU or even GPU tho. I run my Whisper on i7.
- As an option, you may pay for NabuCasa (it’s foundation behind HomeAssistant, so - supporting devs) subscription and use their STT/TTS, they’re great.
- If you use my YAML, you have Mediaplayer entity in your Respeaker device. There’s volume slider. Also, my YAML is using on-device wake word, so you can get rid of OpenWakeWord add-on (however, good recognition for on-device MicroWakeWord is only done for “Okay, Nabu” wake word).
It’s really pleasing to see the progress with the ReSpeaker. I’ve installed Andrii’s yaml and it’s working well.
Unfortunately, speech recognition still has some ways to go. It’s very hit and miss. Frustratingly there are some things I have to ask 5-6 times and enunciating slowly and carefully it still may not pick up correctly.
Yeah, I use Nabu STT, it’s much better than Whisper (at least with basic model).
Thanks @formatBCE
The volume control on your yaml works perfectly, is there a way of setting it via voice. so something like “Ok Nabu, set volume to 10%”?
I will check out that Nabu STT as well…
I don’t think there’s inbuilt intent for this - but one could write custom intent
Not that easy-easy, but you can try, if you not afraid of yamls.
I will try to help.
I have to agree, complete local, even on a three year old intel nuc like mini PC is still not great outside the wake word. It’s decent in a quite room doing complete local but that isn’t always the typical scenario. Thanks for posting the firmware, my listen light works now and for some reason I get an “Access Denied” when trying the new Seeed link in their wiki.
I think some of the local stuff has to do with everything being CPU based to some extent, that or it just needs more work and it’s been out for a while so not sure that’s it either. The models for whisper are confusing to me as some you can’t even use that are smaller (or sound smaller) then ones you can in Whisper where tiny-init8 is the default. Honestly playing with the models made zero improvements to me personally.
I actually trust Nabu with my data and mostly pay because it’s a little to give and while it’s far from perfect but it’s slowly getting better… Using their cloud service is noticably better though. At least to me it is.
I would still, really, really, like to see how this worked locally on a Jetson, which to my knowledge, was moved to GPU based around four to five months ago. Obvious issue is nvidia jetston units aren’t cheap and I don’t really care for an LLM’s and nvidia prices never go down, only up apparently. So, it is impossible to tell if it would make any difference or not but Nvidia/HA and someone from Seeed actually started that thread in the Nvidia developer forums.… Full voice docker package and runs HA Core on the Jetson. From what I could tell they got frustrated with 2 machines running multiple containers each so they moved it all to the Jetson. I believe the cheapest model that works has gone up since that dev post, even though it’s maybe 3 years old… Nvidia, sigh…
Thanks formatBCE, I will have a look at custom intents…
Mine is working but is this a new firmware?
What am I flashing it with?
Just wanted to post this for anyone else it may help. I just fixed my TV background noise issue the only downside is you have to enable the assistant in progress binary sensor for it to work. This will give you a constant repair warning but it’s not being deprecated untill 2025 according to what it said when they added it. I think the release was 2025.4 or something close. For some reason the new assist sensor doesn’t have the same options for the automation.
All I did was create an automation that turned off my stereo when the binary sensor turned on and turned the stereo back on when the binary sensor turned off. This room uses IR but if you can set the sound source volume directly you can just lower it it to a specific and set it back to what it was after it’s done… Temp solution but works perfectly. The new assist_satellite or whatever the new option is only has the below. Not useful since it changes a lot. I hope they add a simple when turned on/off option like the binary sensor at some point
One voice command with new option
After updating to i2s firmware 1.0.7, all my issues went away. My LED works, and the mic works.
I’ve had it up and running with an AUX cord for 2 days and it hasnt crashed. It works fairly well in my basement without much background noise. It hears the wake word fairly well and understands what I am saying, so the whisper STT is working decently using a GPU.
The only issue I am having so far is, the response like “turned light off” seems to be cut off, as in, I can hear the last little bit of it. Same thing when playing audio files using the media player, I have a short mp3 of a doorbell ringing, on the first play, I catch a little bit of the end, on the second play, it works as it should.
Makes me think that there is some sort of timing issue with receiving the data and piping it out, possibly needing to init something to pipe audio out? IDK.
Anyone else experiencing this issue?
Hi all,
Have been following this thread with very keen interest. I have 3 respeaker lite kits. 2 are setup using their yaml code, with the exception that I have micro wake word working on the device.
I have tried some of the code from here on my 3rd device (which does work with my other code) but the returned audio just generates 2 short static bursts. This is the same if I send it audio via media player.
I have the latest 1.0.7 i2s firmware. Any ideas what I am missing?
Update:
It is crashing when trying to play media. The static audio being heard is the same audio everyone hears when the device initialises.
Now to figure out why playing media files is crashing the device.
Here is the console output:
[16:23:00][I][app:100]: ESPHome version 2024.9.2 compiled on Oct 16 2024, 16:22:08
[16:23:00][C][wifi:600]: WiFi:
[16:23:00][C][wifi:428]: Local MAC: 64:E8:33:7E:02:80
[16:23:00][C][wifi:433]: SSID: [redacted]
[16:23:00][C][wifi:436]: IP Address: 192.168.0.43
[16:23:00][C][wifi:440]: BSSID: [redacted]
[16:23:00][C][wifi:441]: Hostname: 'respeakertesting'
[16:23:00][C][wifi:443]: Signal strength: -63 dB ▂▄▆█
[16:23:00][C][wifi:447]: Channel: 1
[16:23:00][C][wifi:448]: Subnet: 255.255.255.0
[16:23:00][C][wifi:449]: Gateway: 192.168.0.1
[16:23:00][C][wifi:450]: DNS1: 8.8.8.8
[16:23:00][C][wifi:451]: DNS2: 1.1.1.1
[16:23:00][C][logger:185]: Logger:
[16:23:00][C][logger:186]: Level: DEBUG
[16:23:00][C][logger:188]: Log Baud Rate: 115200
[16:23:00][C][logger:189]: Hardware UART: USB_SERIAL_JTAG
[16:23:00][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Mute'
[16:23:00][C][gpio.binary_sensor:016]: Pin: GPIO4
[16:23:00][C][esp32_rmt_led_strip:187]: ESP32 RMT LED Strip:
[16:23:00][C][esp32_rmt_led_strip:188]: Pin: 1
[16:23:00][C][esp32_rmt_led_strip:189]: Channel: 0
[16:23:00][C][esp32_rmt_led_strip:214]: RGB Order: GRB
[16:23:00][C][esp32_rmt_led_strip:215]: Max refresh rate: 0
[16:23:00][C][esp32_rmt_led_strip:216]: Number of LEDs: 1
[16:23:00][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'User button'
[16:23:00][C][gpio.binary_sensor:016]: Pin: GPIO3
[16:23:00][C][light:103]: Light 'RespeakerTesting'
[16:23:00][C][light:105]: Default Transition Length: 0.0s
[16:23:00][C][light:106]: Gamma Correct: 2.80
[16:23:00][C][template.switch:068]: Template Switch 'timer_ringing'
[16:23:00][C][template.switch:091]: Restore Mode: always OFF
[16:23:00][C][template.switch:057]: Optimistic: YES
[16:23:00][C][psram:020]: PSRAM:
[16:23:00][C][psram:021]: Available: YES
[16:23:00][C][psram:024]: Size: 8191 KB
[16:23:00][C][safe_mode.button:024]: Safe Mode Button 'Safe Mode Boot'
[16:23:00][C][safe_mode.button:024]: Icon: 'mdi:restart-alert'
[16:23:00][C][factory_reset.button:011]: Factory Reset Button 'Factory reset'
[16:23:00][C][factory_reset.button:011]: Icon: 'mdi:restart-alert'
[16:23:00][C][restart.button:017]: Restart Button 'Restart'
[16:23:00][C][restart.button:017]: Icon: 'mdi:restart'
[16:23:01][C][captive_portal:089]: Captive Portal:
[16:23:01][C][mdns:116]: mDNS:
[16:23:01][C][mdns:117]: Hostname: respeakertesting
[16:23:01][C][esphome.ota:073]: Over-The-Air updates:
[16:23:01][C][esphome.ota:074]: Address: respeakertesting.local:3232
[16:23:01][C][esphome.ota:075]: Version: 2
[16:23:01][C][esphome.ota:078]: Password configured
[16:23:01][C][safe_mode:018]: Safe Mode:
[16:23:01][C][safe_mode:020]: Boot considered successful after 60 seconds
[16:23:01][C][safe_mode:021]: Invoke after 10 boot attempts
[16:23:01][C][safe_mode:023]: Remain in safe mode for 300 seconds
[16:23:01][C][api:139]: API Server:
[16:23:01][C][api:140]: Address: respeakertesting.local:6053
[16:23:01][C][api:142]: Using noise encryption: YES
[16:23:01][C][micro_wake_word:072]: microWakeWord:
[16:23:01][C][micro_wake_word:073]: models:
[16:23:01][C][micro_wake_word:015]: - Wake Word: okay nabu
[16:23:01][C][micro_wake_word:016]: Probability cutoff: 0.97
[16:23:01][C][micro_wake_word:017]: Sliding window size: 5
[16:23:01][C][micro_wake_word:021]: - VAD Model
[16:23:01][C][micro_wake_word:022]: Probability cutoff: 0.50
[16:23:01][C][micro_wake_word:023]: Sliding window size: 5
[16:23:26][D][esp32.preferences:114]: Saving 1 preferences to flash...
[16:23:26][D][esp32.preferences:143]: Saving 1 preferences to flash: 1 cached, 0 written, 0 failed
[16:23:53][I][safe_mode:041]: Boot seems successful; resetting boot loop counter
[16:23:53][D][esp32.preferences:114]: Saving 1 preferences to flash...
[16:23:53][D][esp32.preferences:143]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed
[16:23:54][D][micro_wake_word:347]: Detected 'okay nabu' with sliding average probability is 0.98 and max probability is 0.98
[16:23:54][D][voice_assistant:512]: State changed from IDLE to START_MICROPHONE
[16:23:54][D][voice_assistant:518]: Desired state set to START_PIPELINE
[16:23:54][D][voice_assistant:223]: Starting Microphone
[16:23:54][D][ring_buffer:024]: Created ring buffer with size 16384
[16:23:54][D][voice_assistant:512]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[16:23:54][D][voice_assistant:512]: State changed from STARTING_MICROPHONE to START_PIPELINE
[16:23:54][D][voice_assistant:278]: Requesting start...
[16:23:54][D][voice_assistant:512]: State changed from START_PIPELINE to STARTING_PIPELINE
[16:23:54][D][voice_assistant:533]: Client started, streaming microphone
[16:23:54][D][voice_assistant:512]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[16:23:54][D][voice_assistant:518]: Desired state set to STREAMING_MICROPHONE
[16:23:54][D][voice_assistant:635]: Event Type: 1
[16:23:54][D][voice_assistant:638]: Assist Pipeline running
[16:23:54][D][voice_assistant:635]: Event Type: 3
[16:23:54][D][voice_assistant:649]: STT started
[16:23:54][D][light:036]: 'RespeakerTesting' Setting:
[16:23:54][D][light:047]: State: ON
[16:23:54][D][light:051]: Brightness: 60%
[16:23:54][D][light:059]: Red: 100%, Green: 20%, Blue: 100%
[16:23:54][D][light:109]: Effect: 'Slow Pulse'
[16:23:55][D][voice_assistant:635]: Event Type: 11
[16:23:55][D][voice_assistant:792]: Starting STT by VAD
[16:23:58][D][voice_assistant:635]: Event Type: 12
[16:23:58][D][voice_assistant:796]: STT by VAD end
[16:23:58][D][voice_assistant:512]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[16:23:58][D][voice_assistant:518]: Desired state set to AWAITING_RESPONSE
[16:23:58][D][voice_assistant:512]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[16:23:58][D][light:036]: 'RespeakerTesting' Setting:
[16:23:58][D][light:051]: Brightness: 60%
[16:23:58][D][light:059]: Red: 100%, Green: 20%, Blue: 100%
[16:23:58][D][light:109]: Effect: 'Fast Pulse'
[16:23:58][D][voice_assistant:512]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[16:23:58][D][voice_assistant:512]: State changed from AWAITING_RESPONSE to AWAITING_RESPONSE
[16:23:59][D][voice_assistant:635]: Event Type: 4
[16:23:59][D][voice_assistant:663]: Speech recognised as: "What's the outside temp?"
[16:23:59][D][voice_assistant:635]: Event Type: 5
[16:23:59][D][voice_assistant:668]: Intent started
[16:23:59][D][voice_assistant:635]: Event Type: 6
[16:23:59][D][voice_assistant:635]: Event Type: 7
[16:23:59][D][voice_assistant:691]: Response: "The outside temperature is 24.0°C."
[16:23:59][D][light:036]: 'RespeakerTesting' Setting:
[16:23:59][D][light:051]: Brightness: 60%
[16:23:59][D][light:059]: Red: 20%, Green: 100%, Blue: 100%
[16:23:59][D][light:109]: Effect: 'Slow Pulse'
[16:23:59][D][voice_assistant:635]: Event Type: 8
[16:23:59][D][voice_assistant:711]: Response URL: "https://*redacted*/api/tts_proxy/f313bfbdf5e9410607ab8b9381bc3b2fbd2e9fa7_en-au_35fb9c01c1_tts.home_assistant_cloud.flac"
[16:23:59][D][voice_assistant:512]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[16:23:59][D][voice_assistant:518]: Desired state set to STREAMING_RESPONSE
[16:23:59][D][media_player:080]: 'Media Player' - Setting
[16:23:59][D][media_player:087]: Media URL: https://*redacted*/api/tts_proxy/f313bfbdf5e9410607ab8b9381bc3b2fbd2e9fa7_en-au_35fb9c01c1_tts.home_assistant_cloud.flac
[16:23:59][D][media_player:093]: Announcement: yes
[16:23:59][D][voice_assistant:635]: Event Type: 2
[16:23:59][D][voice_assistant:725]: Assist Pipeline ended
[16:23:59][D][ring_buffer:024]: Created ring buffer with size 48000
[16:23:59][D][ring_buffer:024]: Created ring buffer with size 48000
[16:23:59][D][ring_buffer:024]: Created ring buffer with size 16384
[16:23:59][D][esp-idf:000][speaker_task]: I (67740) I2S: DMA Malloc info, datalen=blocksize=4088, dma_buf_count=4
[16:23:59][D][ring_buffer:024]: Created ring buffer with size 65536
[16:23:59][D][ring_buffer:024]: Created ring buffer with size 65536
[16:23:59][D][nabu_media_player:455]: Starting Media Player Speaker
[16:23:59][D][nabu_media_player:458]: Started Media Player Speaker
Now the audio file plays fine in my browser, so the entire pipeline is working, but audio is not playing on the device. And just to confirm, TTS playback works fine on the seeed studio suggested yaml config on this device, so the speaker and the cabling is all fine.
And my current config:
substitutions:
voice_assist_idle_phase_id: "1"
voice_assist_listening_phase_id: "2"
voice_assist_thinking_phase_id: "3"
voice_assist_replying_phase_id: "4"
voice_assist_not_ready_phase_id: "10"
voice_assist_error_phase_id: "11"
voice_assist_muted_phase_id: "12"
esphome:
name: respeakertesting
friendly_name: RespeakerTesting
min_version: 2024.9.0
platformio_options:
board_build.flash_mode: dio
on_boot:
priority: 600
then:
- script.execute: adjust_led
- delay: 30s
- if:
condition:
lambda: return id(init_in_progress);
then:
- lambda: id(init_in_progress) = false;
- script.execute: adjust_led
esp32:
board: esp32-s3-devkitc-1
variant: esp32s3
flash_size: 8MB
framework:
type: esp-idf
version: recommended
sdkconfig_options:
CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
CONFIG_ESP32S3_INSTRUCTION_CACHE_32KB: "y"
CONFIG_ESP32_S3_BOX_BOARD: "y"
CONFIG_SPIRAM_ALLOW_STACK_EXTERNAL_MEMORY: "y"
CONFIG_SPIRAM_TRY_ALLOCATE_WIFI_LWIP: "y"
# Settings based on https://github.com/espressif/esp-adf/issues/297#issuecomment-783811702
CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
CONFIG_ESP32_WIFI_STATIC_TX_BUFFER: "y"
CONFIG_ESP32_WIFI_TX_BUFFER_TYPE: "0"
CONFIG_ESP32_WIFI_STATIC_TX_BUFFER_NUM: "8"
CONFIG_ESP32_WIFI_CACHE_TX_BUFFER_NUM: "32"
CONFIG_ESP32_WIFI_AMPDU_TX_ENABLED: "y"
CONFIG_ESP32_WIFI_TX_BA_WIN: "16"
CONFIG_ESP32_WIFI_AMPDU_RX_ENABLED: "y"
CONFIG_ESP32_WIFI_RX_BA_WIN: "32"
CONFIG_LWIP_MAX_ACTIVE_TCP: "16"
CONFIG_LWIP_MAX_LISTENING_TCP: "16"
CONFIG_TCP_MAXRTX: "12"
CONFIG_TCP_SYNMAXRTX: "6"
CONFIG_TCP_MSS: "1436"
CONFIG_TCP_MSL: "60000"
CONFIG_TCP_SND_BUF_DEFAULT: "65535"
CONFIG_TCP_WND_DEFAULT: "65535" # Adjusted from linked settings to avoid compilation error
CONFIG_TCP_RECVMBOX_SIZE: "512"
CONFIG_TCP_QUEUE_OOSEQ: "y"
CONFIG_TCP_OVERSIZE_MSS: "y"
CONFIG_LWIP_WND_SCALE: "y"
CONFIG_TCP_RCV_SCALE: "3"
CONFIG_LWIP_TCPIP_RECVMBOX_SIZE: "512"
CONFIG_BT_ALLOCATION_FROM_SPIRAM_FIRST: "y"
CONFIG_BT_BLE_DYNAMIC_ENV_MEMORY: "y"
psram:
mode: octal # quad for N8R2 and octal for N16R8
speed: 80MHz
external_components:
- source:
type: git
url: https://github.com/esphome/voice-kit
ref: dev
components:
- aic3204
- audio_dac
- media_player
- micro_wake_word
- microphone
- nabu
- nabu_microphone
- voice_assistant
- voice_kit
refresh: 0s
api:
encryption:
key: "*redacted*"
ota:
- platform: esphome
password: "*redacted*"
logger:
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
captive_portal:
switch:
- platform: template
id: timer_ringing
optimistic: true
internal: true
restore_mode: ALWAYS_OFF
on_turn_on:
# Duck audio
- nabu.set_ducking:
decibel_reduction: 20
duration: 0.0s
# Ring timer
- script.execute: ring_timer
# Refresh LED
- script.execute: adjust_led
# If 15 minutes have passed and the timer is still ringing, stop it.
- delay: 15min
- switch.turn_off: timer_ringing
on_turn_off:
# Stop any current annoucement (ie: stop the timer ring mid playback)
- if:
condition:
lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
then:
lambda: |-
id(nabu_media_player)
->make_call()
.set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_STOP)
.set_announcement(true)
.perform();
# Set back ducking ratio to zero
- nabu.set_ducking:
decibel_reduction: 0
duration: 1.0s
# Refresh the LED ring
- script.execute: adjust_led
button:
- platform: safe_mode
id: button_safe_mode
name: Safe Mode Boot
- platform: factory_reset
id: factory_reset_btn
name: Factory reset
- platform: restart
name: Restart
id: but_rest
binary_sensor:
- platform: gpio
pin:
number: GPIO4 # D3
inverted: true
id: mute
name: "Mute"
- platform: gpio
pin:
number: GPIO3 # D2
inverted: true
id: user_button
name: "User button"
on_multi_click:
- timing:
- ON for at most 1s
- OFF for at least 0.25s
then:
- if:
condition:
lambda: return !id(init_in_progress);
then:
- if:
condition:
switch.is_on: timer_ringing
then:
- switch.turn_off: timer_ringing
else:
- if:
condition:
lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
then:
- lambda: |
id(nabu_media_player)
->make_call()
.set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_STOP)
.set_announcement(true)
.perform();
else:
- if:
condition:
voice_assistant.is_running:
then:
- voice_assistant.stop:
else:
- if:
condition:
media_player.is_playing:
then:
- media_player.pause:
else:
- if:
condition:
and:
# - switch.is_off: master_mute_switch
- not:
voice_assistant.is_running
then:
- voice_assistant.start:
light:
- platform: esp32_rmt_led_strip
id: led_ww
rgb_order: GRB
pin: GPIO1
num_leds: 1
rmt_channel: 0
chipset: ws2812
name: none
disabled_by_default: true
entity_category: config
default_transition_length: 0s
effects:
- pulse:
- pulse:
name: "Fast Pulse"
transition_length: 100ms
update_interval: 100ms
min_brightness: 50%
max_brightness: 100%
- pulse:
name: "Slow Pulse"
transition_length: 250ms
update_interval: 250ms
min_brightness: 50%
max_brightness: 100%
# Audio and Voice Assistant Config
i2s_audio:
- id: i2s_output
i2s_lrclk_pin:
number: GPIO7
allow_other_uses: true
i2s_bclk_pin:
number: GPIO8
allow_other_uses: true
i2s_mclk_pin:
number: GPIO9
allow_other_uses: true
- id: i2s_input
i2s_lrclk_pin:
number: GPIO7
allow_other_uses: true
i2s_bclk_pin:
number: GPIO8
allow_other_uses: true
i2s_mclk_pin:
number: GPIO9
allow_other_uses: true
microphone:
- platform: nabu_microphone
i2s_din_pin: GPIO44
adc_type: external
pdm: false
sample_rate: 16000
bits_per_sample: 32bit
i2s_mode: secondary
i2s_audio_id: i2s_input
channel_0:
id: nabu_mic_mww
channel_1:
id: nabu_mic_va
media_player:
- platform: nabu
id: nabu_media_player
name: Media Player
internal: false
sample_rate: 16000
i2s_dout_pin: GPIO43
bits_per_sample: 32bit
i2s_mode: secondary
i2s_audio_id: i2s_output
volume_increment: 0.05
volume_min: 0.4
volume_max: 0.85
on_announcement:
- nabu.set_ducking:
decibel_reduction: 20
duration: 0.0s
on_state:
if:
condition:
and:
- switch.is_off: timer_ringing
- not:
voice_assistant.is_running:
- not:
lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
then:
- nabu.set_ducking:
decibel_reduction: 0
duration: 1.0s
micro_wake_word:
models:
- model: https://github.com/kahrendt/microWakeWord/releases/download/okay_nabu/okay_nabu.json
vad:
microphone: nabu_mic_mww
on_wake_word_detected:
# If a timer is ringing: Stop it, do not start the voice assistant (We can stop timer from voice!)
- if:
condition:
switch.is_on: timer_ringing
then:
- switch.turn_off: timer_ringing
# Start voice assistant, stop current announcement.
else:
- if:
condition:
lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
then:
lambda: |-
id(nabu_media_player)
->make_call()
.set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_STOP)
.set_announcement(true)
.perform();
- voice_assistant.start:
wake_word: !lambda return wake_word;
voice_assistant:
id: va
microphone: nabu_mic_va
media_player: nabu_media_player
noise_suppression_level: 0
auto_gain: 0dBFS
volume_multiplier: 1
on_client_connected:
- lambda: id(init_in_progress) = false;
- micro_wake_word.start:
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
- script.execute: adjust_led
on_client_disconnected:
- voice_assistant.stop:
- lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
- script.execute: adjust_led
on_error:
- if:
condition:
lambda: return !id(init_in_progress);
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};
- script.execute: adjust_led
- delay: 1s
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
- script.execute: adjust_led
on_start:
- nabu.set_ducking:
decibel_reduction: 20 # Number of dB quieter; higher implies more quiet, 0 implies full volume
duration: 0.0s # The duration of the transition (default is 0)
on_listening:
- lambda: id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
- script.execute: adjust_led
on_stt_vad_end:
- lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
- script.execute: adjust_led
on_tts_start:
- lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
- script.execute: adjust_led
on_end:
- wait_until:
not:
voice_assistant.is_running:
- nabu.set_ducking:
decibel_reduction: 0 # 0 dB means no reduction
duration: 1.0s
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
- script.execute: adjust_led
on_timer_finished:
- switch.turn_on: timer_ringing
script:
- id: ring_timer
then:
- while:
condition:
switch.is_on: timer_ringing
then:
- media_player.play_media: https://*redacted*/local/timer_finished.flac
- delay: 1s
- wait_until:
not:
media_player.is_playing:
- id: adjust_led
then:
- if:
condition:
lambda: return !id(init_in_progress);
then:
- if:
condition:
switch.is_on: timer_ringing
then:
- light.turn_on:
id: led_ww
red: 0%
green: 100%
blue: 0%
brightness: 60%
effect: fast pulse
else:
- if:
condition:
wifi.connected:
then:
- if:
condition:
api.connected:
then:
- lambda: |
switch(id(voice_assistant_phase)) {
case ${voice_assist_listening_phase_id}:
id(led_ww).turn_on()
.set_brightness(0.6)
.set_rgb(1.0, 0.2, 1.0)
.set_effect("Slow Pulse")
.perform();
break;
case ${voice_assist_thinking_phase_id}:
id(led_ww).turn_on()
.set_brightness(0.6)
.set_rgb(1.0, 0.2, 1.0)
.set_effect("Fast Pulse")
.perform();
break;
case ${voice_assist_replying_phase_id}:
id(led_ww).turn_on()
.set_brightness(0.6)
.set_rgb(0.2, 1.0, 1.0)
.set_effect("Slow Pulse")
.perform();
break;
case ${voice_assist_error_phase_id}:
id(led_ww).turn_on()
.set_brightness(0.6)
.set_rgb(1.0, 1.0, 0.2)
.set_effect("Fast Pulse")
.perform();
break;
case ${voice_assist_muted_phase_id}:
id(led_ww).turn_on()
.set_brightness(0.3)
.set_rgb(1.0, 0.0, 0.0)
.perform();
break;
case ${voice_assist_not_ready_phase_id}:
id(led_ww).turn_on()
.set_brightness(0.3)
.set_rgb(1.0, 1.0, 0.2)
.perform();
break;
default:
id(led_ww).turn_off()
.perform();
}
else:
- light.turn_on:
id: led_ww
red: 100%
green: 0%
blue: 0%
brightness: 40%
effect: fast pulse
else:
- light.turn_on:
id: led_ww
red: 100%
green: 0%
blue: 0%
brightness: 40%
effect: slow pulse
else:
- light.turn_on:
id: led_ww
red: 100%
green: 100%
blue: 0%
brightness: 30%
effect: slow pulse
globals:
- id: init_in_progress
type: bool
restore_value: false
initial_value: "true"
- id: voice_assistant_phase
type: int
restore_value: false
initial_value: ${voice_assist_not_ready_phase_id}
Wow, this thread got very technical real quick.
I just got the kit on the strength of the mostlyChris video. Got it all working as per his guide and the Seeed website HA bit. But the RGB LED does not work.
- I did not do any flashing of firmware. I just uploaded the ESPHome bin file.
- I had to add an API key to the ESPHome file to get through the Configure Integration stage.
- I tried to test the LED by loading the RGB test sketch on the Seeed website. Using the Arduino IDE, it seemed to compile and load OK but i don’t think it ran properly as i got no input to the Console (the sketch had eg Serial.println(“Red color test”) in it), nor did the LED come on.
- Also tried the short Usr Button Usage sketch which again appeared to compile and load ok but again nothing appeared on the console.
- Having played with the test sketches I re installed the ESPHome bin file using the web installer and it all worked ok again apart from the LED, so no damage done.
Can anyone throw any light on the sketches not working?
The thread above is very long and I have difficulty picking out clear advice. Is there anything I should do/try to get the LED working?
Ta.
Ok so I seem to have managed to get the Board to flash something to respeaker_lite_i2s_xiao_1.0.7.bin. I connected it to the board and not the Seeed Studio XIAO ESP32S3. Was that right?
Also saw instructions for the following to be flashed, trying to understand what this is meant for?
dfu-util -e -a 1 -D respeaker_lite_usb_xmos_v2.0.5.bin
Is this for the Seeed Studio XIAO ESP32S3 itself and should I flash it?
Instructions also say “For Windows users, after flashing the USB firmware, need to uninstall the device, then you can use it as a sound device.”
sorry it seems I am confused and hoping you could explain it a bit…
USB firmware is also for XMOS board, to use it as USB audio card with PC/Mac. Don’t flash it. Use i2s one.
ESP32 should be flashed with ESPHome firmware.
You need to flash the respeaker board with i2s firmware v 1.0.7 to fix RGB led (and other issues).
Follow the seeed wiki, the dfu-util bit and plug the USB c data cable Into the respeaker board (not the Xiao esp32 s3).
I think there is some sort of buffer (non FIFO?) issues with the current esphome assistant stuff. See my post right before yours, I have audio issues as well.