"ReSpeaker Lite" - new Seeed Studio Voice Assistant Development Kit hardware combine ESP32 with XMOS XU316 DSP chip for advanced audio processing as a ESPHome-based Home Assistant Assist Satellite voice devkit

  1. You can use better model for speech recognition. It will require better CPU or even GPU tho. I run my Whisper on i7.
  2. As an option, you may pay for NabuCasa (it’s foundation behind HomeAssistant, so - supporting devs) subscription and use their STT/TTS, they’re great.
  3. If you use my YAML, you have Mediaplayer entity in your Respeaker device. There’s volume slider. Also, my YAML is using on-device wake word, so you can get rid of OpenWakeWord add-on (however, good recognition for on-device MicroWakeWord is only done for “Okay, Nabu” wake word).
1 Like

It’s really pleasing to see the progress with the ReSpeaker. I’ve installed Andrii’s yaml and it’s working well.

Unfortunately, speech recognition still has some ways to go. It’s very hit and miss. Frustratingly there are some things I have to ask 5-6 times and enunciating slowly and carefully it still may not pick up correctly.

Yeah, I use Nabu STT, it’s much better than Whisper (at least with basic model).

2 Likes

Thanks @formatBCE

The volume control on your yaml works perfectly, is there a way of setting it via voice. so something like “Ok Nabu, set volume to 10%”?

I will check out that Nabu STT as well…

I don’t think there’s inbuilt intent for this - but one could write custom intent :smiling_face:
Not that easy-easy, but you can try, if you not afraid of yamls. :slight_smile:

I will try to help.

I have to agree, complete local, even on a three year old intel nuc like mini PC is still not great outside the wake word. It’s decent in a quite room doing complete local but that isn’t always the typical scenario. Thanks for posting the firmware, my listen light works now and for some reason I get an “Access Denied” when trying the new Seeed link in their wiki.

I think some of the local stuff has to do with everything being CPU based to some extent, that or it just needs more work and it’s been out for a while so not sure that’s it either. The models for whisper are confusing to me as some you can’t even use that are smaller (or sound smaller) then ones you can in Whisper where tiny-init8 is the default. Honestly playing with the models made zero improvements to me personally.

I actually trust Nabu with my data and mostly pay because it’s a little to give and while it’s far from perfect but it’s slowly getting better… Using their cloud service is noticably better though. At least to me it is.

I would still, really, really, like to see how this worked locally on a Jetson, which to my knowledge, was moved to GPU based around four to five months ago. Obvious issue is nvidia jetston units aren’t cheap and I don’t really care for an LLM’s and nvidia prices never go down, only up apparently. So, it is impossible to tell if it would make any difference or not but Nvidia/HA and someone from Seeed actually started that thread in the Nvidia developer forums.… Full voice docker package and runs HA Core on the Jetson. From what I could tell they got frustrated with 2 machines running multiple containers each so they moved it all to the Jetson. I believe the cheapest model that works has gone up since that dev post, even though it’s maybe 3 years old… Nvidia, sigh…

1 Like

@ginandbacon

https://files.seeedstudio.com/wiki/SenseCAP/respeaker/respeaker_lite_i2s_xiao_1.0.7.bin

Thanks formatBCE, I will have a look at custom intents…

Mine is working but is this a new firmware?
What am I flashing it with?

Check this section: ReSpeaker Lite Voice Assistant Kit | Seeed Studio Wiki

2 Likes

Just wanted to post this for anyone else it may help. I just fixed my TV background noise issue the only downside is you have to enable the assistant in progress binary sensor for it to work. This will give you a constant repair warning but it’s not being deprecated untill 2025 according to what it said when they added it. I think the release was 2025.4 or something close. For some reason the new assist sensor doesn’t have the same options for the automation.

All I did was create an automation that turned off my stereo when the binary sensor turned on and turned the stereo back on when the binary sensor turned off. This room uses IR but if you can set the sound source volume directly you can just lower it it to a specific and set it back to what it was after it’s done… Temp solution but works perfectly. The new assist_satellite or whatever the new option is only has the below. Not useful since it changes a lot. I hope they add a simple when turned on/off option like the binary sensor at some point

One voice command with new option

1 Like

After updating to i2s firmware 1.0.7, all my issues went away. My LED works, and the mic works.

I’ve had it up and running with an AUX cord for 2 days and it hasnt crashed. It works fairly well in my basement without much background noise. It hears the wake word fairly well and understands what I am saying, so the whisper STT is working decently using a GPU.

The only issue I am having so far is, the response like “turned light off” seems to be cut off, as in, I can hear the last little bit of it. Same thing when playing audio files using the media player, I have a short mp3 of a doorbell ringing, on the first play, I catch a little bit of the end, on the second play, it works as it should.

Makes me think that there is some sort of timing issue with receiving the data and piping it out, possibly needing to init something to pipe audio out? IDK.

Anyone else experiencing this issue?

Hi all,

Have been following this thread with very keen interest. I have 3 respeaker lite kits. 2 are setup using their yaml code, with the exception that I have micro wake word working on the device.

I have tried some of the code from here on my 3rd device (which does work with my other code) but the returned audio just generates 2 short static bursts. This is the same if I send it audio via media player.

I have the latest 1.0.7 i2s firmware. Any ideas what I am missing?


Update:

It is crashing when trying to play media. The static audio being heard is the same audio everyone hears when the device initialises.

Now to figure out why playing media files is crashing the device.

Here is the console output:

[16:23:00][I][app:100]: ESPHome version 2024.9.2 compiled on Oct 16 2024, 16:22:08
[16:23:00][C][wifi:600]: WiFi:
[16:23:00][C][wifi:428]:   Local MAC: 64:E8:33:7E:02:80
[16:23:00][C][wifi:433]:   SSID: [redacted]
[16:23:00][C][wifi:436]:   IP Address: 192.168.0.43
[16:23:00][C][wifi:440]:   BSSID: [redacted]
[16:23:00][C][wifi:441]:   Hostname: 'respeakertesting'
[16:23:00][C][wifi:443]:   Signal strength: -63 dB ▂▄▆█
[16:23:00][C][wifi:447]:   Channel: 1
[16:23:00][C][wifi:448]:   Subnet: 255.255.255.0
[16:23:00][C][wifi:449]:   Gateway: 192.168.0.1
[16:23:00][C][wifi:450]:   DNS1: 8.8.8.8
[16:23:00][C][wifi:451]:   DNS2: 1.1.1.1
[16:23:00][C][logger:185]: Logger:
[16:23:00][C][logger:186]:   Level: DEBUG
[16:23:00][C][logger:188]:   Log Baud Rate: 115200
[16:23:00][C][logger:189]:   Hardware UART: USB_SERIAL_JTAG
[16:23:00][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Mute'
[16:23:00][C][gpio.binary_sensor:016]:   Pin: GPIO4
[16:23:00][C][esp32_rmt_led_strip:187]: ESP32 RMT LED Strip:
[16:23:00][C][esp32_rmt_led_strip:188]:   Pin: 1
[16:23:00][C][esp32_rmt_led_strip:189]:   Channel: 0
[16:23:00][C][esp32_rmt_led_strip:214]:   RGB Order: GRB
[16:23:00][C][esp32_rmt_led_strip:215]:   Max refresh rate: 0
[16:23:00][C][esp32_rmt_led_strip:216]:   Number of LEDs: 1
[16:23:00][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'User button'
[16:23:00][C][gpio.binary_sensor:016]:   Pin: GPIO3
[16:23:00][C][light:103]: Light 'RespeakerTesting'
[16:23:00][C][light:105]:   Default Transition Length: 0.0s
[16:23:00][C][light:106]:   Gamma Correct: 2.80
[16:23:00][C][template.switch:068]: Template Switch 'timer_ringing'
[16:23:00][C][template.switch:091]:   Restore Mode: always OFF
[16:23:00][C][template.switch:057]:   Optimistic: YES
[16:23:00][C][psram:020]: PSRAM:
[16:23:00][C][psram:021]:   Available: YES
[16:23:00][C][psram:024]:   Size: 8191 KB
[16:23:00][C][safe_mode.button:024]: Safe Mode Button 'Safe Mode Boot'
[16:23:00][C][safe_mode.button:024]:   Icon: 'mdi:restart-alert'
[16:23:00][C][factory_reset.button:011]: Factory Reset Button 'Factory reset'
[16:23:00][C][factory_reset.button:011]:   Icon: 'mdi:restart-alert'
[16:23:00][C][restart.button:017]: Restart Button 'Restart'
[16:23:00][C][restart.button:017]:   Icon: 'mdi:restart'
[16:23:01][C][captive_portal:089]: Captive Portal:
[16:23:01][C][mdns:116]: mDNS:
[16:23:01][C][mdns:117]:   Hostname: respeakertesting
[16:23:01][C][esphome.ota:073]: Over-The-Air updates:
[16:23:01][C][esphome.ota:074]:   Address: respeakertesting.local:3232
[16:23:01][C][esphome.ota:075]:   Version: 2
[16:23:01][C][esphome.ota:078]:   Password configured
[16:23:01][C][safe_mode:018]: Safe Mode:
[16:23:01][C][safe_mode:020]:   Boot considered successful after 60 seconds
[16:23:01][C][safe_mode:021]:   Invoke after 10 boot attempts
[16:23:01][C][safe_mode:023]:   Remain in safe mode for 300 seconds
[16:23:01][C][api:139]: API Server:
[16:23:01][C][api:140]:   Address: respeakertesting.local:6053
[16:23:01][C][api:142]:   Using noise encryption: YES
[16:23:01][C][micro_wake_word:072]: microWakeWord:
[16:23:01][C][micro_wake_word:073]:   models:
[16:23:01][C][micro_wake_word:015]:     - Wake Word: okay nabu
[16:23:01][C][micro_wake_word:016]:       Probability cutoff: 0.97
[16:23:01][C][micro_wake_word:017]:       Sliding window size: 5
[16:23:01][C][micro_wake_word:021]:     - VAD Model
[16:23:01][C][micro_wake_word:022]:       Probability cutoff: 0.50
[16:23:01][C][micro_wake_word:023]:       Sliding window size: 5
[16:23:26][D][esp32.preferences:114]: Saving 1 preferences to flash...
[16:23:26][D][esp32.preferences:143]: Saving 1 preferences to flash: 1 cached, 0 written, 0 failed
[16:23:53][I][safe_mode:041]: Boot seems successful; resetting boot loop counter
[16:23:53][D][esp32.preferences:114]: Saving 1 preferences to flash...
[16:23:53][D][esp32.preferences:143]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed
[16:23:54][D][micro_wake_word:347]: Detected 'okay nabu' with sliding average probability is 0.98 and max probability is 0.98
[16:23:54][D][voice_assistant:512]: State changed from IDLE to START_MICROPHONE
[16:23:54][D][voice_assistant:518]: Desired state set to START_PIPELINE
[16:23:54][D][voice_assistant:223]: Starting Microphone
[16:23:54][D][ring_buffer:024]: Created ring buffer with size 16384
[16:23:54][D][voice_assistant:512]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[16:23:54][D][voice_assistant:512]: State changed from STARTING_MICROPHONE to START_PIPELINE
[16:23:54][D][voice_assistant:278]: Requesting start...
[16:23:54][D][voice_assistant:512]: State changed from START_PIPELINE to STARTING_PIPELINE
[16:23:54][D][voice_assistant:533]: Client started, streaming microphone
[16:23:54][D][voice_assistant:512]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[16:23:54][D][voice_assistant:518]: Desired state set to STREAMING_MICROPHONE
[16:23:54][D][voice_assistant:635]: Event Type: 1
[16:23:54][D][voice_assistant:638]: Assist Pipeline running
[16:23:54][D][voice_assistant:635]: Event Type: 3
[16:23:54][D][voice_assistant:649]: STT started
[16:23:54][D][light:036]: 'RespeakerTesting' Setting:
[16:23:54][D][light:047]:   State: ON
[16:23:54][D][light:051]:   Brightness: 60%
[16:23:54][D][light:059]:   Red: 100%, Green: 20%, Blue: 100%
[16:23:54][D][light:109]:   Effect: 'Slow Pulse'
[16:23:55][D][voice_assistant:635]: Event Type: 11
[16:23:55][D][voice_assistant:792]: Starting STT by VAD
[16:23:58][D][voice_assistant:635]: Event Type: 12
[16:23:58][D][voice_assistant:796]: STT by VAD end
[16:23:58][D][voice_assistant:512]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[16:23:58][D][voice_assistant:518]: Desired state set to AWAITING_RESPONSE
[16:23:58][D][voice_assistant:512]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[16:23:58][D][light:036]: 'RespeakerTesting' Setting:
[16:23:58][D][light:051]:   Brightness: 60%
[16:23:58][D][light:059]:   Red: 100%, Green: 20%, Blue: 100%
[16:23:58][D][light:109]:   Effect: 'Fast Pulse'
[16:23:58][D][voice_assistant:512]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[16:23:58][D][voice_assistant:512]: State changed from AWAITING_RESPONSE to AWAITING_RESPONSE
[16:23:59][D][voice_assistant:635]: Event Type: 4
[16:23:59][D][voice_assistant:663]: Speech recognised as: "What's the outside temp?"
[16:23:59][D][voice_assistant:635]: Event Type: 5
[16:23:59][D][voice_assistant:668]: Intent started
[16:23:59][D][voice_assistant:635]: Event Type: 6
[16:23:59][D][voice_assistant:635]: Event Type: 7
[16:23:59][D][voice_assistant:691]: Response: "The outside temperature is 24.0°C."
[16:23:59][D][light:036]: 'RespeakerTesting' Setting:
[16:23:59][D][light:051]:   Brightness: 60%
[16:23:59][D][light:059]:   Red: 20%, Green: 100%, Blue: 100%
[16:23:59][D][light:109]:   Effect: 'Slow Pulse'
[16:23:59][D][voice_assistant:635]: Event Type: 8
[16:23:59][D][voice_assistant:711]: Response URL: "https://*redacted*/api/tts_proxy/f313bfbdf5e9410607ab8b9381bc3b2fbd2e9fa7_en-au_35fb9c01c1_tts.home_assistant_cloud.flac"
[16:23:59][D][voice_assistant:512]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[16:23:59][D][voice_assistant:518]: Desired state set to STREAMING_RESPONSE
[16:23:59][D][media_player:080]: 'Media Player' - Setting
[16:23:59][D][media_player:087]:   Media URL: https://*redacted*/api/tts_proxy/f313bfbdf5e9410607ab8b9381bc3b2fbd2e9fa7_en-au_35fb9c01c1_tts.home_assistant_cloud.flac
[16:23:59][D][media_player:093]:  Announcement: yes
[16:23:59][D][voice_assistant:635]: Event Type: 2
[16:23:59][D][voice_assistant:725]: Assist Pipeline ended
[16:23:59][D][ring_buffer:024]: Created ring buffer with size 48000
[16:23:59][D][ring_buffer:024]: Created ring buffer with size 48000
[16:23:59][D][ring_buffer:024]: Created ring buffer with size 16384
[16:23:59][D][esp-idf:000][speaker_task]: I (67740) I2S: DMA Malloc info, datalen=blocksize=4088, dma_buf_count=4

[16:23:59][D][ring_buffer:024]: Created ring buffer with size 65536
[16:23:59][D][ring_buffer:024]: Created ring buffer with size 65536
[16:23:59][D][nabu_media_player:455]: Starting Media Player Speaker
[16:23:59][D][nabu_media_player:458]: Started Media Player Speaker

Now the audio file plays fine in my browser, so the entire pipeline is working, but audio is not playing on the device. And just to confirm, TTS playback works fine on the seeed studio suggested yaml config on this device, so the speaker and the cabling is all fine.

And my current config:

substitutions:
  voice_assist_idle_phase_id: "1"
  voice_assist_listening_phase_id: "2"
  voice_assist_thinking_phase_id: "3"
  voice_assist_replying_phase_id: "4"
  voice_assist_not_ready_phase_id: "10"
  voice_assist_error_phase_id: "11"
  voice_assist_muted_phase_id: "12"
esphome:
  name: respeakertesting
  friendly_name: RespeakerTesting
  min_version: 2024.9.0
  platformio_options:
    board_build.flash_mode: dio
  on_boot:
    priority: 600
    then:
      - script.execute: adjust_led
      - delay: 30s
      - if:
          condition:
            lambda: return id(init_in_progress);
          then:
            - lambda: id(init_in_progress) = false;
            - script.execute: adjust_led
esp32:
  board: esp32-s3-devkitc-1
  variant: esp32s3
  flash_size: 8MB
  framework:
    type: esp-idf
    version: recommended
    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
      CONFIG_ESP32S3_INSTRUCTION_CACHE_32KB: "y"
      CONFIG_ESP32_S3_BOX_BOARD: "y"
      CONFIG_SPIRAM_ALLOW_STACK_EXTERNAL_MEMORY: "y"
      
      CONFIG_SPIRAM_TRY_ALLOCATE_WIFI_LWIP: "y"

      # Settings based on https://github.com/espressif/esp-adf/issues/297#issuecomment-783811702
      CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
      CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
      CONFIG_ESP32_WIFI_STATIC_TX_BUFFER: "y"
      CONFIG_ESP32_WIFI_TX_BUFFER_TYPE: "0"
      CONFIG_ESP32_WIFI_STATIC_TX_BUFFER_NUM: "8"
      CONFIG_ESP32_WIFI_CACHE_TX_BUFFER_NUM: "32"
      CONFIG_ESP32_WIFI_AMPDU_TX_ENABLED: "y"
      CONFIG_ESP32_WIFI_TX_BA_WIN: "16"
      CONFIG_ESP32_WIFI_AMPDU_RX_ENABLED: "y"
      CONFIG_ESP32_WIFI_RX_BA_WIN: "32"
      CONFIG_LWIP_MAX_ACTIVE_TCP: "16"
      CONFIG_LWIP_MAX_LISTENING_TCP: "16"
      CONFIG_TCP_MAXRTX: "12"
      CONFIG_TCP_SYNMAXRTX: "6"
      CONFIG_TCP_MSS: "1436"
      CONFIG_TCP_MSL: "60000"
      CONFIG_TCP_SND_BUF_DEFAULT: "65535"
      CONFIG_TCP_WND_DEFAULT: "65535"  # Adjusted from linked settings to avoid compilation error
      CONFIG_TCP_RECVMBOX_SIZE: "512"
      CONFIG_TCP_QUEUE_OOSEQ: "y"
      CONFIG_TCP_OVERSIZE_MSS: "y"
      CONFIG_LWIP_WND_SCALE: "y"
      CONFIG_TCP_RCV_SCALE: "3"
      CONFIG_LWIP_TCPIP_RECVMBOX_SIZE: "512"

      CONFIG_BT_ALLOCATION_FROM_SPIRAM_FIRST: "y"
      CONFIG_BT_BLE_DYNAMIC_ENV_MEMORY: "y"

psram:
  mode: octal  # quad for N8R2 and octal for N16R8
  speed: 80MHz

external_components:
  - source:
      type: git
      url: https://github.com/esphome/voice-kit
      ref: dev
    components:
      - aic3204
      - audio_dac
      - media_player
      - micro_wake_word
      - microphone
      - nabu
      - nabu_microphone
      - voice_assistant
      - voice_kit
    refresh: 0s

api:
  encryption:
    key: "*redacted*"

ota:
  - platform: esphome
    password: "*redacted*"

logger:

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

captive_portal:

switch:
  - platform: template
    id: timer_ringing
    optimistic: true
    internal: true
    restore_mode: ALWAYS_OFF
    on_turn_on:
      # Duck audio
      - nabu.set_ducking:
          decibel_reduction: 20
          duration: 0.0s
      # Ring timer
      - script.execute: ring_timer
      # Refresh LED
      - script.execute: adjust_led
      # If 15 minutes have passed and the timer is still ringing, stop it.
      - delay: 15min
      - switch.turn_off: timer_ringing
    on_turn_off:
      # Stop any current annoucement (ie: stop the timer ring mid playback)
      - if:
          condition:
            lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
          then:
            lambda: |-
              id(nabu_media_player)
                ->make_call()
                .set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_STOP)
                .set_announcement(true)
                .perform();
      # Set back ducking ratio to zero
      - nabu.set_ducking:
          decibel_reduction: 0
          duration: 1.0s
      # Refresh the LED ring
      - script.execute: adjust_led

button:
  - platform: safe_mode
    id: button_safe_mode
    name: Safe Mode Boot
  - platform: factory_reset
    id: factory_reset_btn
    name: Factory reset
  - platform: restart
    name: Restart
    id: but_rest
  

binary_sensor:
  - platform: gpio
    pin: 
      number: GPIO4 # D3
      inverted: true
    id: mute
    name: "Mute"
  - platform: gpio
    pin: 
      number: GPIO3 # D2
      inverted: true
    id: user_button
    name: "User button"
    on_multi_click:
      - timing:
          - ON for at most 1s
          - OFF for at least 0.25s
        then:
          - if:
              condition:
                lambda: return !id(init_in_progress);
              then:
                - if:
                    condition:
                      switch.is_on: timer_ringing
                    then:
                      - switch.turn_off: timer_ringing
                    else:
                      - if:
                          condition:
                            lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
                          then:
                            - lambda: |
                                id(nabu_media_player)
                                  ->make_call()
                                  .set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_STOP)
                                  .set_announcement(true)
                                  .perform();
                          else:
                            - if:
                                condition:
                                  voice_assistant.is_running:
                                then:
                                  - voice_assistant.stop:
                                else:
                                  - if:
                                      condition:
                                        media_player.is_playing:
                                      then:
                                        - media_player.pause:
                                      else:
                                        - if:
                                            condition:
                                              and:
                                                # - switch.is_off: master_mute_switch
                                                - not:
                                                    voice_assistant.is_running
                                            then:
                                              - voice_assistant.start:

light:
  - platform: esp32_rmt_led_strip
    id: led_ww
    rgb_order: GRB
    pin: GPIO1
    num_leds: 1
    rmt_channel: 0
    chipset: ws2812
    name: none
    disabled_by_default: true
    entity_category: config
    default_transition_length: 0s

    effects:
      - pulse:
      - pulse:
          name: "Fast Pulse"
          transition_length: 100ms
          update_interval: 100ms
          min_brightness: 50%
          max_brightness: 100%
      - pulse:
          name: "Slow Pulse"
          transition_length: 250ms
          update_interval: 250ms
          min_brightness: 50%
          max_brightness: 100%


 # Audio and Voice Assistant Config  

i2s_audio:
  - id: i2s_output
    i2s_lrclk_pin: 
      number: GPIO7
      allow_other_uses: true
    i2s_bclk_pin:  
      number: GPIO8
      allow_other_uses: true
    i2s_mclk_pin:  
      number: GPIO9
      allow_other_uses: true

  - id: i2s_input
    i2s_lrclk_pin:  
      number: GPIO7
      allow_other_uses: true
    i2s_bclk_pin:  
      number: GPIO8
      allow_other_uses: true
    i2s_mclk_pin:  
      number: GPIO9
      allow_other_uses: true

microphone:
  - platform: nabu_microphone
    i2s_din_pin: GPIO44
    adc_type: external
    pdm: false
    sample_rate: 16000
    bits_per_sample: 32bit
    i2s_mode: secondary
    i2s_audio_id: i2s_input
    channel_0:
      id: nabu_mic_mww
    channel_1:
      id: nabu_mic_va
      

media_player:
  - platform: nabu
    id: nabu_media_player
    name: Media Player
    internal: false
    sample_rate: 16000
    i2s_dout_pin: GPIO43
    bits_per_sample: 32bit
    i2s_mode: secondary
    i2s_audio_id: i2s_output
    volume_increment: 0.05
    volume_min: 0.4
    volume_max: 0.85
    on_announcement:
      - nabu.set_ducking:
          decibel_reduction: 20
          duration: 0.0s
    on_state:
      if:
        condition:
          and:
            - switch.is_off: timer_ringing
            - not:
                voice_assistant.is_running:
            - not:
                lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
        then:
          - nabu.set_ducking:
              decibel_reduction: 0
              duration: 1.0s

micro_wake_word:
  models:
    - model: https://github.com/kahrendt/microWakeWord/releases/download/okay_nabu/okay_nabu.json
  vad:
  microphone: nabu_mic_mww
  on_wake_word_detected:
    # If a timer is ringing: Stop it, do not start the voice assistant (We can stop timer from voice!)
    - if:
        condition:
          switch.is_on: timer_ringing
        then:
          - switch.turn_off: timer_ringing
        # Start voice assistant, stop current announcement.
        else:
          - if:
              condition:
                lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
              then:
                lambda: |-
                  id(nabu_media_player)
                    ->make_call()
                    .set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_STOP)
                    .set_announcement(true)
                    .perform();
          - voice_assistant.start:
              wake_word: !lambda return wake_word;

voice_assistant:
  id: va
  microphone: nabu_mic_va
  media_player: nabu_media_player
  noise_suppression_level: 0
  auto_gain: 0dBFS
  volume_multiplier: 1
  on_client_connected:
    - lambda: id(init_in_progress) = false;
    - micro_wake_word.start:
    - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
    - script.execute: adjust_led
  on_client_disconnected:
    - voice_assistant.stop:
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
    - script.execute: adjust_led
  on_error:
    - if:
        condition:
          lambda: return !id(init_in_progress);
        then:
          - lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};
          - script.execute: adjust_led
          - delay: 1s
          - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
          - script.execute: adjust_led
  on_start:
    - nabu.set_ducking:
        decibel_reduction: 20   # Number of dB quieter; higher implies more quiet, 0 implies full volume
        duration: 0.0s          # The duration of the transition (default is 0)
  on_listening:
    - lambda: id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
    - script.execute: adjust_led
  on_stt_vad_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
    - script.execute: adjust_led
  on_tts_start:
    - lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
    - script.execute: adjust_led
  on_end:
    - wait_until:
        not:
          voice_assistant.is_running:
    - nabu.set_ducking:
        decibel_reduction: 0   # 0 dB means no reduction
        duration: 1.0s
    - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
    - script.execute: adjust_led
  on_timer_finished:
    - switch.turn_on: timer_ringing

script:
  - id: ring_timer
    then:
      - while:
          condition:
            switch.is_on: timer_ringing
          then:
            - media_player.play_media: https://*redacted*/local/timer_finished.flac
            - delay: 1s
            - wait_until:
                not:
                  media_player.is_playing:
  - id: adjust_led
    then:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - if: 
                condition:
                  switch.is_on: timer_ringing
                then:
                  - light.turn_on:
                      id: led_ww           
                      red: 0%
                      green: 100%
                      blue: 0%
                      brightness: 60%
                      effect: fast pulse 
                else:
                  - if:
                      condition:
                        wifi.connected:
                      then:
                        - if:
                            condition:
                              api.connected:
                            then:
                              - lambda: |
                                  switch(id(voice_assistant_phase)) {
                                    case ${voice_assist_listening_phase_id}:
                                      id(led_ww).turn_on()
                                        .set_brightness(0.6)
                                        .set_rgb(1.0, 0.2, 1.0)
                                        .set_effect("Slow Pulse")
                                        .perform();
                                      break;
                                    case ${voice_assist_thinking_phase_id}:
                                      id(led_ww).turn_on()
                                        .set_brightness(0.6)
                                        .set_rgb(1.0, 0.2, 1.0)
                                        .set_effect("Fast Pulse")
                                        .perform();
                                      break;
                                    case ${voice_assist_replying_phase_id}:
                                      id(led_ww).turn_on()
                                        .set_brightness(0.6)
                                        .set_rgb(0.2, 1.0, 1.0)
                                        .set_effect("Slow Pulse")
                                        .perform();
                                      break;
                                    case ${voice_assist_error_phase_id}:
                                      id(led_ww).turn_on()
                                        .set_brightness(0.6)
                                        .set_rgb(1.0, 1.0, 0.2)
                                        .set_effect("Fast Pulse")
                                        .perform();
                                      break;
                                    case ${voice_assist_muted_phase_id}:
                                      id(led_ww).turn_on()
                                        .set_brightness(0.3)
                                        .set_rgb(1.0, 0.0, 0.0)
                                        .perform();
                                      break;
                                    case ${voice_assist_not_ready_phase_id}:
                                      id(led_ww).turn_on()
                                        .set_brightness(0.3)
                                        .set_rgb(1.0, 1.0, 0.2)
                                        .perform();
                                        break;
                                    default:
                                      id(led_ww).turn_off()
                                        .perform();
                                  }
                            else:
                              - light.turn_on:
                                  id: led_ww           
                                  red: 100%
                                  green: 0%
                                  blue: 0%
                                  brightness: 40%
                                  effect: fast pulse 
                      else:
                        - light.turn_on:
                            id: led_ww           
                            red: 100%
                            green: 0%
                            blue: 0%
                            brightness: 40%
                            effect: slow pulse
          else:
            - light.turn_on:
                id: led_ww           
                red: 100%
                green: 100%
                blue: 0%
                brightness: 30%
                effect: slow pulse

globals:
  - id: init_in_progress
    type: bool
    restore_value: false
    initial_value: "true"
  - id: voice_assistant_phase
    type: int
    restore_value: false
    initial_value: ${voice_assist_not_ready_phase_id}

Wow, this thread got very technical real quick.

I just got the kit on the strength of the mostlyChris video. Got it all working as per his guide and the Seeed website HA bit. But the RGB LED does not work.

  1. I did not do any flashing of firmware. I just uploaded the ESPHome bin file.
  2. I had to add an API key to the ESPHome file to get through the Configure Integration stage.
  3. I tried to test the LED by loading the RGB test sketch on the Seeed website. Using the Arduino IDE, it seemed to compile and load OK but i don’t think it ran properly as i got no input to the Console (the sketch had eg Serial.println(“Red color test”) in it), nor did the LED come on.
  4. Also tried the short Usr Button Usage sketch which again appeared to compile and load ok but again nothing appeared on the console.
  5. Having played with the test sketches I re installed the ESPHome bin file using the web installer and it all worked ok again apart from the LED, so no damage done.

Can anyone throw any light on the sketches not working?

The thread above is very long and I have difficulty picking out clear advice. Is there anything I should do/try to get the LED working?

Ta.

Ok so I seem to have managed to get the Board to flash something to respeaker_lite_i2s_xiao_1.0.7.bin. I connected it to the board and not the Seeed Studio XIAO ESP32S3. Was that right?

Also saw instructions for the following to be flashed, trying to understand what this is meant for?

dfu-util -e -a 1 -D respeaker_lite_usb_xmos_v2.0.5.bin

Is this for the Seeed Studio XIAO ESP32S3 itself and should I flash it?

Instructions also say “For Windows users, after flashing the USB firmware, need to uninstall the device, then you can use it as a sound device.”

sorry it seems I am confused and hoping you could explain it a bit…

USB firmware is also for XMOS board, to use it as USB audio card with PC/Mac. Don’t flash it. Use i2s one.
ESP32 should be flashed with ESPHome firmware.

2 Likes

You need to flash the respeaker board with i2s firmware v 1.0.7 to fix RGB led (and other issues).

Follow the seeed wiki, the dfu-util bit and plug the USB c data cable Into the respeaker board (not the Xiao esp32 s3).

1 Like

I think there is some sort of buffer (non FIFO?) issues with the current esphome assistant stuff. See my post right before yours, I have audio issues as well.