On device wake word on ESP32-S3 is here - Voice: Chapter 6

I’ve spent a lot of time struggling to get this board to work. My issue is that it isn’t recognising it’s own onboard PSRAM… I’ve contacted the seller but…China.

Once I get it to see the PSRAM (which I think is simply an ESPhome code / config thing) it should work.

In the meantime I’ve ordered some of these which apparently do work. I’ll be able to confirm once they arrive.

That does look pretty cool. Who is going to risk the cost to test it out…?

Over on the ESPHome discord in the #machine-learning channel, a user got it working with the esp32-s3-devkitc-1-wroom-1-N16R8 (link to message).

1 Like

I have it working on a waveshare S3 mini

with Microphone only version of the firmware from here.

I have tried the firmware with the speaker and it appears to not recognise the wake word with that firmware. I have not had a lot of time to play yet with the speaker. I have also ordered some N16R8 S3’s as the memory size is important apparently. This is not my field of expertise but I am making progress.

That is very similar to the one I’m trying to get working (ESP32-S3FH4R2) but with no luck. BigBobbas has been helping me but the ESPhome log shows the device not seeing that it has PSRAM…

UPDATE: I re-tried using the same GPIO as this example and it works now. There was obviously some strange conflict with the GPIO I had selected. So now I can safely say that the boards I linked earlier do work.

Thanks! The developer who mentioned that it didn’t work on any S3 with PSRAM did mention memory differences between models being one of the issues. I thought might be different versions had memory from different manufacturers or something (not my area of expertise either) but it sounds like he was meant it requires a specific amount of PSRAM since that board only has 2MB with 8MB being ideal and possibly 4MB but I’ll stick to 8MB as the price differences is maybe 2 dollars if there is even an option to choose the same model with different amounts

Also thanks to the other posters and links, it’s good to know the devkit and wroom-1 appear to work as long as they have enough PSRAM. It really sounds like that’s the deciding factor but obviously more boards need to be tested. They did mention to post any boards/models users get working. I’m guessing Discord is the place to post that information if you do get it to work on a board that hasn’t already been confirmed. Thanks again!

I’ve no idea if this is a only my device thing, but the wake word doesnt appear to respond over time, this isnt a new issue it was happening on the old build without local wake word.

Appears to happen over time and I have to restart the device. I’ve not seen it reported on the issue trackers so I’m hedging more towards it being an issue with my s3box3.

Theres also an audible “pop” every now and then, I assume this is the microphone becoming active and is normal, but may or may not be related

box link to aliexpress: sold out :sob:

Well, I just ordered one. I’ll let everyone know how it works out. On paper it should work but we all know that doesn’t always on out. I just happened to search Amazon and they have them in the US store for the same price. My main issue was ordering from AliExpress and having to deal with a return if it didn’t work but Amazon will take anything back so if it doesn’t work I’ll just send it back for a refund. only 7 left in stock. Not sure about the UK store.

ESP32-S3-Korvo-1 Development Board
https://a.co/d/1xahvv4

1 Like

If you live in the UK then you can buy an ESP32-S3-BOX-3 from my store.

I only have a limited amount and no idea how popular they are going to be. I hope people will understand it’s best price I can do it for with all of the effort that’s gone into the site etc.

https://letsautomate.shop

1 Like

I attempted to compile the code for the s3-box-3 and got the following compile termination, any ideas?

...
Compiling .pioenvs/esp32-voice-node-5a9788/src/esphome/components/micro_wake_word/micro_wake_word.o
Compiling .pioenvs/esp32-voice-node-5a9788/src/esphome/components/network/util.o
In file included from src/esphome/components/micro_wake_word/micro_wake_word.cpp:1:
src/esphome/components/micro_wake_word/micro_wake_word.h:19:10: fatal error: tensorflow/lite/core/c/common.h: No such file or directory
 #include <tensorflow/lite/core/c/common.h>
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
*** [.pioenvs/esp32-voice-node-5a9788/src/esphome/components/micro_wake_word/micro_wake_word.o] Error 1
========================= [FAILED] Took 288.57 seconds =========================

Removing all the yaml and trying again fixed it and it compiled. After adding my esp32-s3 to HA should I expect to be able to add it to my assist pipeline at the bottom under wake word? It just says I don’t have a wake word engine setup yet.

This is not mostly my work and could use some attention to detail for the included h files used for Arduino by esspressif specifically for the korvo-1. Works for the S3 korvo-1 though.

substitutions:
  friendly_name: korvo 

esphome:
  name: korvo
  friendly_name: ${friendly_name}
  name_add_mac_suffix: true
  platformio_options:
    board_build.flash_mode: dio
    upload_speed: 460800
  project:
    name: esphome.voice-assistant
    version: "1.0"
  min_version: 2023.11.1
  on_boot:
    - priority: 600
      then:
        - light.turn_on:
            id: led_ring
            brightness: 70%
            effect: connecting

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf
    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
      CONFIG_AUDIO_BOARD_CUSTOM: "y"
      CONFIG_ESP32_S3_KORVO1_BOARD: "y"
    components:
      - name: esp32_s3_korvo1_board
        source: github://abmantis/esphome_custom_audio_boards@main
        refresh: 0s

psram:
  mode: octal
  speed: 80MHz

external_components:
  - source: github://pr#5230
    components: esp_adf
    refresh: 0s

ota:
logger:
api:
  on_client_connected:
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - delay: 1s
            - voice_assistant.start_continuous:
            - delay: 1s
            - voice_assistant.stop:
            - delay: 2s
            - voice_assistant.start_continuous:
            - script.execute: reset_led
  on_client_disconnected:
    then:
      - light.turn_on:
          id: led_ring
          blue: 0%
          red: 100%
          green: 100%
          brightness: 50%
          effect: connecting

dashboard_import:
  package_import_url: github://esphome/firmware/voice-assistant/esp32-s3-korvo1.yaml@main

wifi:
  use_address: 192.168.0.xx
  ap:
  on_connect:
    then:
      - delay: 5s # Gives time for improv results to be transmitted
      - ble.disable:
  on_disconnect:
    then:
      - ble.enable:

improv_serial:

esp32_improv:
  authorizer: none

button:
  - platform: factory_reset
    id: factory_reset_btn
    name: Factory reset

esp_adf:
  board: esp32s3korvo1

microphone:
  - platform: esp_adf
    id: korvo_mic

speaker:
  - platform: esp_adf
    id: korvo_speaker

voice_assistant:
  id: voice_asst
  microphone: korvo_mic
  speaker: korvo_speaker
  noise_suppression_level: 4
  auto_gain: 10dBFS
  volume_multiplier: 1
  use_wake_word: false
  on_listening:
    - light.turn_on:
        id: led_ring
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: wakeword
  on_tts_start:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 0%
        green: 100%
        brightness: 50%
        effect: pulse
  on_end:
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 100%
        green: 0%
        brightness: 100%
        effect: none
    - delay: 1s
    - script.execute: reset_led
    - script.wait: reset_led
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }

script:
  - id: reset_led
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - light.turn_on:
                id: led_ring
                blue: 100%
                red: 0%
                green: 0%
                brightness: 30%
                effect: none
          else:
            - light.turn_off: led_ring

switch:
  - platform: gpio
    id: pa_ctrl
    pin: GPIO38
    name: "${friendly_name} Speaker Mute"
    restore_mode: ALWAYS_ON

  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(voice_asst).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
      - script.execute: reset_led
    on_turn_off:
      - voice_assistant.stop
      - script.execute: reset_led

light:
  - platform: esp32_rmt_led_strip
    id: led_ring
    name: "${friendly_name} Light"
    pin: GPIO19
    num_leds: 12
    rmt_channel: 0
    rgb_order: GRB
    chipset: ws2812
    default_transition_length: 0s
    effects:
      - pulse:
          name: "Pulse"
          transition_length: 0.5s
          update_interval: 0.5s
      - addressable_twinkle:
          name: "Working"
          twinkle_probability: 5%
          progress_interval: 4ms
      - addressable_color_wipe:
          name: "Wakeword"
          colors:
            - red: 0%
              green: 50%
              blue: 0%
              num_leds: 12
          add_led_interval: 20ms
          reverse: false
      - addressable_color_wipe:
          name: "Connecting"
          colors:
            - red: 60%
              green: 60%
              blue: 60%
              num_leds: 12
            - red: 60%
              green: 60%
              blue: 0%
              num_leds: 12
          add_led_interval: 100ms
          reverse: true

binary_sensor:
  - platform: template
    name: "${friendly_name} Volume Up"
    id: btn_volume_up
  - platform: template
    name: "${friendly_name} Volume Down"
    id: btn_volume_down
  - platform: template
    name: "${friendly_name} Set"
    id: btn_set
  - platform: template
    name: "${friendly_name} Play"
    id: btn_play
  - platform: template
    name: "${friendly_name} Mode"
    id: btn_mode
    on_multi_click:
      - timing:
          - ON for at least 10s
        then:
          - button.press: factory_reset_btn
  - platform: template
    name: "${friendly_name} Record"
    id: btn_record
    on_press:
      - voice_assistant.start:
      - light.turn_on:
          id: led_ring
          brightness: 100%
          effect: "Wakeword"
    on_release:
      - voice_assistant.stop:
      - light.turn_off:
          id: led_ring

sensor:
  - id: button_adc
    platform: adc
    internal: true
    pin: 8
    attenuation: 11db
    update_interval: 15ms
    filters:
      - median:
          window_size: 5
          send_every: 5
          send_first_at: 1
      - delta: 0.1
    on_value_range:
      - below: 0.55
        then:
          - binary_sensor.template.publish:
              id: btn_volume_up
              state: ON
      - above: 0.65
        below: 0.92
        then:
          - binary_sensor.template.publish:
              id: btn_volume_down
              state: ON
      - above: 1.02
        below: 1.33
        then:
          - binary_sensor.template.publish:
              id: btn_set
              state: ON
      - above: 1.43
        below: 1.77
        then:
          - binary_sensor.template.publish:
              id: btn_play
              state: ON
      - above: 1.87
        below: 2.15
        then:
          - binary_sensor.template.publish:
              id: btn_mode
              state: ON
      - above: 2.25
        below: 2.56
        then:
          - binary_sensor.template.publish:
              id: btn_record
              state: ON
      - above: 2.8
        then:
          - binary_sensor.template.publish:
              id: btn_volume_up
              state: OFF
          - binary_sensor.template.publish:
              id: btn_volume_down
              state: OFF
          - binary_sensor.template.publish:
              id: btn_set
              state: OFF
          - binary_sensor.template.publish:
              id: btn_play
              state: OFF
          - binary_sensor.template.publish:
              id: btn_mode
              state: OFF
          - binary_sensor.template.publish:
              id: btn_record
              state: OFF

I just got my Box3, when its working I like it. Though it seems to cut off or misundertand words. For instnace when I say “turn on the den” I get an error saying something like “can’t find the device Din” or that.

My main concern however is the wake word “Ok Nabu” is only working around 20% of the time the first time and usually have to say between 3-5 times to get it to wake.

Any one have any suggestions?

Thank you
Tommy

Still surprised about all the satellite talks when basically everyone has a mobile phone and most have tablets. So why spend cash on some satellites if all we really need is wake word support in the companion app.
Would even work on Android TVs, Android cars etc. etc.

No cost, lots of processing power and readily available everywhere.

I guess you do not have kids or other family members that do not always carry their phone everywhere (if they even have one, which I am sure most small kids do not have). Even I who do carry my phone almost all the time personally still use our existing Google Nest / Google smart speakers a lot for hand-free voice control.

I believe that most common usecases are when and where handsfree operation makes sense, like example in the kitchen while your hands are busy, with usecases like controlling lights (brightness or ON/OFF), set/operate timers and reminders or alarms, adding stuff to shopping lists or to-do list, and music controls.

Regardless, there are several usecase reasons that appeal to mainstream users and that is why Google and Amason have each sold more than 500,000 Google Nest / Google Home and Amazon Echo / Alexa smart speakers each ao far.

Check out result in this wish list poll once you done it yourself:

Not sure what you are trying to show me with that poll. It is about features, not hardware. And wake word support is one of the top priorities there.

I do not have kids but I do not need kids to know that I have multiple Android and iOS devices in my household. And I do not need to be the owner of the iOS devices to be able to use them to use voice control.

So the point is, that if the companion app supported wake words, anybody in my home could enter any room that has any Android or iOS device lying around and could give voice commands.
The device just needs to be in hearing distance.

And you are quoting sales for echo, alexa etc. woth 500 k. 500 k units is not that much. And very few people here want to share all their data with big tech.
So a lot of people are buying more or less expensive satellite hardware. They are fun to play with but they have little future. They will lie around somewhere in a year or two because they are too bulky or too slow. Or because people realize that they need to buy one per room because they are not as mobile as all our Android and iOS phones and tablets.

So, sure, you can buy lots of dedicated hardware for a task that really is just a mic and speaker. Or you could use what everybody owns and most people even have old devices lying around (I still have my Samsung S2 and S6edge). So my wife and I would currently own >6 perfectly fine “satellites” in the form of phones and tablets. Just waiting to be used as local voice controls.
Cost for 6 ESP S3 boxes? Couple of hundred euros. For what? A big, bulky satellite with inferior screen and sound compared to a mobile phone or tablet :wink:

Alex, you do what works best for you. It’s great if you prefer to use the Android app rather than setup satellite devices. That is why there are several options.

Personally I find that getting my phone out, logging in to the phone, and starting the app before I can turn a light on or off … is more painful than getting out of my chair and walking to the light switch. But speaking a command seems so easy … all i need is for it to work reliably :wink:

1 Like

The idea is to use wake words on the locked phone.
Or have devices like old phones or tablets remain unlocked.
Or speaking to my TV while I am watching.

You can already control all your devices with the locked phone by using the tiles. Now it is “just” necessary to add wake words :slight_smile:

And the computing power of a 50 € tablet is much higher than that of a 50 € esp device. Mic and speaker are also better. Imagine just hanging a bunch of firehd tablets on your walls and speaking to them. Would look much nicer than esp devices and offer nice big screens and great touch control :slight_smile:

I think it would not hurt to use some old devices like that. Does not mean one can’t buy as many devices as one likes as well.