ESP32 Voice Assistant [es8288 Lyta-T dev board] not working properly

Hi all,

I have a ESP32-Lyra-t development board (https://docs.espressif.com/projects/esp-adf/en/latest/design-guide/dev-boards/get-started-esp32-lyrat.html) and cant seem to get it to function correctly as a reliable voice assistant. I have managed to get it to recignise the wake-word a few times, but it has not managed to get full action to actually happen. The Board looks close in hardware to the LUXE configuration. I have also followed @thatguy_za blog (https://tristam.ie/2024/1026/)

The audio out seems a bit garbled, but I do hear the ‘I didnt recognise that’ message - but I cant seem to find the secret to getting audio input to be reliable.

Is there a method I can get the audio on HA side to hear what ESP is transmitting?
Any hints would be greatly appreciated.

Richard


esphome:
  name: voice-speaker
  on_boot:
     - priority: -100
       then:
         - wait_until: api.connected
         - output.turn_on: green_light
         - delay: 2s
         - output.turn_off: green_light
         
esp32:
  board: esp-wrover-kit
  framework:
    type: esp-idf
  
logger:
  level: DEBUG
api:

ota:

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Speaker Fallback Hotspot"
    password: "notthemama"

captive_portal:

i2c:
  sda: GPIO18
  scl: GPIO23

external_components:
  - source: github://pr#3552  # DAC support https://github.com/esphome/esphome/pull/3552
    components: [es8388]
    refresh: 0s
  - source: github://pr#5230
    components:
      - esp_adf
    refresh: 0s

es8388:

microphone:
  - platform: i2s_audio
    id: mic
    adc_type: external
    i2s_din_pin: GPIO35
    pdm: false
    channel: right

i2s_audio:
  i2s_lrclk_pin: GPIO25
  i2s_bclk_pin: GPIO5
#  i2s_mclk_pin: GPIO0

speaker:
  - platform: i2s_audio
    id: speaker_out
    dac_type: external
    i2s_dout_pin: GPIO26
    mode: stereo  

switch:
  - platform: gpio
    pin: GPIO21
    name: "AMP Switch"
    id: amp_switch
    restore_mode: ALWAYS_ON

  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF
    entity_category: config
    on_turn_on:
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    on_turn_off:
      - voice_assistant.stop

voice_assistant:
  microphone: mic
  use_wake_word: true
  noise_suppression_level: 2
  auto_gain: 31dBFS
  speaker: speaker_out
  id: assist
#  on_wake_word_detected: 
#    - output.turn_on:  green_light
  on_listening: 
   - output.turn_on: green_light
  on_end:
    - output.turn_off:  green_light
  on_client_connected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.start_continuous



output:
  - id: green_light
    platform: gpio
    pin: GPIO22


esp32_touch:

binary_sensor:
  - platform: esp32_touch
    pin: GPIO33
    threshold: 1000
    name: "Play"

  - platform: esp32_touch
    pin: GPIO32
    threshold: 1000
    name: "Set"
    on_press:
      then:
        - switch.toggle: use_wake_word

  - platform: esp32_touch
    pin: GPIO27
    threshold: 1000
    name: "Vol Up"

  - platform: esp32_touch
    pin: GPIO13
    threshold: 600
    name: "Vol Down"

EDIT: I got it to recognise the wake, below is the transcript. I said ‘Turn on the Longe Lights’ - HA translated ‘I just laid them down straight.’


[14:37:14][D][voice_assistant:414]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[14:37:18][D][esp32.preferences:114]: Saving 1 preferences to flash...
[14:37:18][D][esp32.preferences:143]: Saving 1 preferences to flash: 1 cached, 0 written, 0 failed
[14:37:34][D][voice_assistant:521]: Event Type: 4
[14:37:34][D][voice_assistant:549]: Speech recognised as: " I just laid them down straight."
[14:37:34][D][voice_assistant:521]: Event Type: 5
[14:37:34][D][voice_assistant:554]: Intent started
[14:37:34][D][voice_assistant:521]: Event Type: 6
[14:37:34][D][voice_assistant:521]: Event Type: 7
[14:37:34][D][voice_assistant:577]: Response: "Sorry, I couldn't understand that"
[14:37:34][D][voice_assistant:521]: Event Type: 8

I have managed to track down the wrong sample rate being presented… It wants 16Khz, but the data is 11khz - so the audio is distorted.

Quite like talking to myself… I managed to get voice assistant working BUT

ESP32_Touch Breaks it! … dont use TOUCH as the ESPHome code is doing something really bad.

Congratulations

Can you share your config?

Also, how were you using ESP32_Touch?

I opted not to use ESP32_Touch… I plan to modify the board to buttons… but eventaully may look at the touch component.

Touch Config… removed from my final

esp32_touch:

binary_sensor:
  - platform: esp32_touch
    pin: GPIO33
    threshold: 1000
    name: "Play"

  - platform: esp32_touch
    pin: GPIO32
    threshold: 1000
    name: "Set"
    on_press:
      then:
        - switch.toggle: use_wake_word

  - platform: esp32_touch
    pin: GPIO27
    threshold: 1000
    name: "Vol Up"

  - platform: esp32_touch
    pin: GPIO13
    threshold: 600
    name: "Vol Down"

Full config - working

substitutions:
  name: "smart-speaker"
  friendly_name: "smart-speaker"
  wifi_ap_password: ""

esphome:
  name: ${name}
  friendly_name: ${friendly_name}
  name_add_mac_suffix: false
  min_version: 2023.10.1
  on_boot:
    then:
      - output.turn_on: dac_mute

esp32:
  board: esp-wrover-kit
  framework:
    type: arduino

logger:
api:
  services:
    - service: start_va
      then:
        - voice_assistant.start
    - service: stop_va
      then:
        - voice_assistant.stop
ota:

i2c:
  sda: GPIO18
  scl: GPIO23

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Speaker Fallback Hotspot"
    password: "_password_"

captive_portal:

#esp32_touch:  

improv_serial:

external_components:
  - source: github://pr#3552 # DAC support https://github.com/esphome/esphome/pull/3552
    components: [es8388]
    refresh: 0s

es8388:

globals:
  - id: wifi_connected
    type: bool
    initial_value: "false"
    restore_value: false

interval:
  - interval: 1s
    then:
      - if:
          condition:
            and:
              - lambda: "return !id(wifi_connected);"
              - wifi.connected:
          then:
            - globals.set:
                id: wifi_connected
                value: "true"

output:
  - platform: gpio
    id: dac_mute
    pin: GPIO21
    inverted: true
  
  - platform: gpio
    id: green_light
    pin: GPIO22

i2s_audio:
  - i2s_lrclk_pin: GPIO25
    i2s_bclk_pin: GPIO5

media_player:
  - platform: i2s_audio
    name: None
    id: luxe_out
    dac_type: external
    i2s_dout_pin: GPIO26
    mode: stereo
    on_state:
      if:
        condition:
          media_player.is_playing:
        then:
          output.turn_off: dac_mute
        else:
          output.turn_on: dac_mute

microphone:
  - platform: i2s_audio
    id: luxe_microphone
    i2s_din_pin: GPIO35
    adc_type: external
    pdm: false

voice_assistant:
  id: va
  microphone: luxe_microphone
  media_player: luxe_out
  use_wake_word: true
  on_listening:
    - output.turn_on: green_light
  on_tts_end:
    - media_player.play_media: !lambda return x;
    - output.turn_off: green_light
    # This is useful when you want to stream the response on another media_player
    # - homeassistant.service:
    #     service: media_player.play_media
    #     data:
    #       entity_id: media_player.some_speaker
    #       media_content_id: !lambda 'return x;'
    #       media_content_type: music
    #       announce: "true"
  on_client_connected:
    - if:
        condition:
          - switch.is_on: use_wake_word
        then:
          - voice_assistant.start_continuous:
  on_client_disconnected:
    - if:
        condition:
          - switch.is_on: use_wake_word
        then:
          - voice_assistant.stop:
 # on_end:
 #   - delay: 100ms
 #   - wait_until:
 #       not:
 #         media_player.is_playing: luxe_out
 #   - script.execute: reset_led
  on_error:
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }





switch:
  - platform: template
    name: Use Wake Word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);

Hey @RichardPar

Which version of the LyraT board do you have? V4.3 with touch buttons?

Touch is working perfectly for me so it’s strange that you’re having such a bad time with it.

yep… a 4.3 - From the schematics, they seem to have 2 different types of ESP32’s - not idea if they actually fitted differerent ones.

RichardPar, I seem to be having the same issue with the sample rate. How did you debug it, and how does one go about fixing it. I did not see any configuration for this either on the ESPHome side or on the VoiceAssistant side.