Local Voice Commands with ESP32s3 Dev Module

The Espressif ESP32-S3-Korvo-1 developer module comes with all the hardware features expected for a smart speaker but in a DIY format: mic array, speaker output, addressable leds, push buttons, sd card, battery charger. Espressif provides example software that implements the necessary audio processing, wakeword detection, and voice command identification. I merged the voice command example with the source files resulting from an ESPHome configuration for the dev module to create a usable voice command interface for HA. The example software provides up to 200 preprogrammed voice command phrases. Automations related to each phrase must be configured in HA.

Wakeword detection seems to be on par with existing commercial smart speakers. The voice command recognition is usable but way more sensitive to ambient noise and distance to speaker. Steps to setup build environment and configure the software are posted in this github repository:

5 Likes

Very interesting! Would this be the simplest way to achieve local voice command on HA?

I’m not really familiar with any of the other methods for a good comparison. I had started out to add support for this dev board to the ESP32 Rhasspy Satellite project but got sidetracked a bit. My current implementation is not the most user friendly but should not be much trouble for users with linux / programing experience to get working. It has benefit that all the audio processing is done right on the device so good for users with HA running on a system with less resources.

1 Like

Hello, nice work!

now we have voice-assist with 2023.5.0 core of HA, there is a much simple way to use a Atom Echo for Speech-to-text and let your HA hardware make the work of recognition.

But there is no more Atom Echo available now.

I search a way to use a standard ESP32 with inmp441 to do the job, is someone has succeed in please?

ESPHome 2023.4 added microphone support, I was looking into using onboard wakeword detection that that then feeds the processed audio to the HA once activated.

2 Likes

which hardware do you use please? PDM, ADC or I2S microphone? and which one ESP32?

thanks

@Olivier974 as example this one:
https://www.aliexpress.com/item/1005002803964499.html

I got mine today and it is the original Espressif:

1 Like

Thanks!

nice board, very complete.

I am very interested in your project.
“Mixing” ESPHome with sample code couldn’t have ben easy! Great work.
I’d like to make something like a wearable Star Trek badge, with a Push-Button or Touch Sensor as I don’t really trust wake words, plus it must use a lot of battery.
One mic pointing “up” towards the mouth might be enough as I think the array would point “away”.
If I were to do that I would also have to “mess” with the mic array code and I’m afraid :crazy_face:
I appreciate the building instructions on your Github page but for now I don’t trust myself to make the mods.
Anyway, I’ll follow your project with much interest; thank you

Have you had any luck getting yours to work with HA / ESPHome? I’ve been able to get the default Chinese language firmware working, but everything audio related I try to do with ESPHome has not worked. I’ve got the buttons and the LEDs working, but no audio in or out.

Could you share your pin configuration? I’m not sure I haev mine correct.

It works but needs some attention to detail like the S3 Box variants

substitutions:
#  name: korvo
  friendly_name: korvo 

esphome:
  name: korvo
  friendly_name: ${friendly_name}
  name_add_mac_suffix: true
  platformio_options:
    board_build.flash_mode: dio
    upload_speed: 460800
  project:
    name: esphome.voice-assistant
    version: "1.0"
  min_version: 2023.11.1
  on_boot:
    - priority: 600
      then:
        - light.turn_on:
            id: led_ring
            brightness: 70%
            effect: connecting

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf
    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
      CONFIG_AUDIO_BOARD_CUSTOM: "y"
      CONFIG_ESP32_S3_KORVO1_BOARD: "y"
    components:
      - name: esp32_s3_korvo1_board
        source: github://abmantis/esphome_custom_audio_boards@main
        refresh: 0s

psram:
  mode: octal
  speed: 80MHz

external_components:
  - source: github://pr#5230
    components: esp_adf
    refresh: 0s

ota:
logger:
api:
  on_client_connected:
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - delay: 1s
            - voice_assistant.start_continuous:
            - delay: 1s
            - voice_assistant.stop:
            - delay: 2s
            - voice_assistant.start_continuous:
            - script.execute: reset_led
  on_client_disconnected:
    then:
      - light.turn_on:
          id: led_ring
          blue: 0%
          red: 100%
          green: 100%
          brightness: 50%
          effect: connecting

dashboard_import:
  package_import_url: github://esphome/firmware/voice-assistant/esp32-s3-korvo1.yaml@main

wifi:
  use_address: 192.168.0.xx
  ap:
  on_connect:
    then:
      - delay: 5s # Gives time for improv results to be transmitted
      - ble.disable:
  on_disconnect:
    then:
      - ble.enable:

improv_serial:

esp32_improv:
  authorizer: none

button:
  - platform: factory_reset
    id: factory_reset_btn
    name: Factory reset

esp_adf:
  board: esp32s3korvo1

microphone:
  - platform: esp_adf
    id: korvo_mic

speaker:
  - platform: esp_adf
    id: korvo_speaker

voice_assistant:
  id: voice_asst
  microphone: korvo_mic
  speaker: korvo_speaker
  noise_suppression_level: 4
  auto_gain: 10dBFS
  volume_multiplier: 1
  use_wake_word: false
  on_listening:
    - light.turn_on:
        id: led_ring
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: wakeword
  on_tts_start:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 0%
        green: 100%
        brightness: 50%
        effect: pulse
  on_end:
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 100%
        green: 0%
        brightness: 100%
        effect: none
    - delay: 1s
    - script.execute: reset_led
    - script.wait: reset_led
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }

script:
  - id: reset_led
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - light.turn_on:
                id: led_ring
                blue: 100%
                red: 0%
                green: 0%
                brightness: 30%
                effect: none
          else:
            - light.turn_off: led_ring

switch:
  - platform: gpio
    id: pa_ctrl
    pin: GPIO38
    name: "${friendly_name} Speaker Mute"
    restore_mode: ALWAYS_ON

  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(voice_asst).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
      - script.execute: reset_led
    on_turn_off:
      - voice_assistant.stop
      - script.execute: reset_led

light:
  - platform: esp32_rmt_led_strip
    id: led_ring
    name: "${friendly_name} Light"
    pin: GPIO19
    num_leds: 12
    rmt_channel: 0
    rgb_order: GRB
    chipset: ws2812
    default_transition_length: 0s
    effects:
      - pulse:
          name: "Pulse"
          transition_length: 0.5s
          update_interval: 0.5s
      - addressable_twinkle:
          name: "Working"
          twinkle_probability: 5%
          progress_interval: 4ms
      - addressable_color_wipe:
          name: "Wakeword"
          colors:
            - red: 0%
              green: 50%
              blue: 0%
              num_leds: 12
          add_led_interval: 20ms
          reverse: false
      - addressable_color_wipe:
          name: "Connecting"
          colors:
            - red: 60%
              green: 60%
              blue: 60%
              num_leds: 12
            - red: 60%
              green: 60%
              blue: 0%
              num_leds: 12
          add_led_interval: 100ms
          reverse: true

binary_sensor:
  - platform: template
    name: "${friendly_name} Volume Up"
    id: btn_volume_up
  - platform: template
    name: "${friendly_name} Volume Down"
    id: btn_volume_down
  - platform: template
    name: "${friendly_name} Set"
    id: btn_set
  - platform: template
    name: "${friendly_name} Play"
    id: btn_play
  - platform: template
    name: "${friendly_name} Mode"
    id: btn_mode
    on_multi_click:
      - timing:
          - ON for at least 10s
        then:
          - button.press: factory_reset_btn
  - platform: template
    name: "${friendly_name} Record"
    id: btn_record
    on_press:
      - voice_assistant.start:
      - light.turn_on:
          id: led_ring
          brightness: 100%
          effect: "Wakeword"
    on_release:
      - voice_assistant.stop:
      - light.turn_off:
          id: led_ring

sensor:
  - id: button_adc
    platform: adc
    internal: true
    pin: 8
    attenuation: 11db
    update_interval: 15ms
    filters:
      - median:
          window_size: 5
          send_every: 5
          send_first_at: 1
      - delta: 0.1
    on_value_range:
      - below: 0.55
        then:
          - binary_sensor.template.publish:
              id: btn_volume_up
              state: ON
      - above: 0.65
        below: 0.92
        then:
          - binary_sensor.template.publish:
              id: btn_volume_down
              state: ON
      - above: 1.02
        below: 1.33
        then:
          - binary_sensor.template.publish:
              id: btn_set
              state: ON
      - above: 1.43
        below: 1.77
        then:
          - binary_sensor.template.publish:
              id: btn_play
              state: ON
      - above: 1.87
        below: 2.15
        then:
          - binary_sensor.template.publish:
              id: btn_mode
              state: ON
      - above: 2.25
        below: 2.56
        then:
          - binary_sensor.template.publish:
              id: btn_record
              state: ON
      - above: 2.8
        then:
          - binary_sensor.template.publish:
              id: btn_volume_up
              state: OFF
          - binary_sensor.template.publish:
              id: btn_volume_down
              state: OFF
          - binary_sensor.template.publish:
              id: btn_set
              state: OFF
          - binary_sensor.template.publish:
              id: btn_play
              state: OFF
          - binary_sensor.template.publish:
              id: btn_mode
              state: OFF
          - binary_sensor.template.publish:
              id: btn_record
              state: OFF
1 Like

This take advantage of all microphones? I’ve tried manny setup versions, but none of them are closer to Alexa or Google Nest at detecting wake word (with or without microwakeword). TTS aparently works well. But detecting wakeword works only if i am next to korvo or speak verry loud. Can we use audio drivers and wake word engine from skainet? I’m not good with pyton programing but i can work with yaml