Voice assistant + wake word + media_player

Is there currently any possibility to use all three components on an esp32/esp32-s3?

The problem is that as far as I understand, for wake word detection to work, the platform needs to be set to esp-idf, which does not support any media_player component though. Is there any workaround or future where this is going to work?

1 Like

Not sure where you got that from.

The only issue that I know of (and that is likely to be fixed in ESPHome 2023.11) is that you don’t have receive the “reply” from HA in some conditions if you use the media_player component rather than the speker one.

Now, tbh, using French and local STT (haven’t try NC) is not practically working (slow, poor recognition), so I’m not actually using it beyond basic testing.

I only tested German and it was, well, horrible too. But when I switched to Home Assistant cloud for the audio to text processing it was 1000x better, so for now I am using that.

As far as the platform goes, I saw that in a YouTube video on how to set up voice assistant (and it was explicitly mentioned - although I could not find it anywhere else).
Esp-idf is also the platform that is preselected on the example code.

1 Like

Right, but I’m using Arduino with no (apparent) issues, beside the one above.
However, seeing that PR has me confused, now: Voice Assistant: Voice Activity Detection for the Arduino framework by h3ndrik · Pull Request #5613 · esphome/esphome · GitHub

My config for the Rasiaudio Muse

substitutions:
  device_name: museassist
  friendly_name: "Muse Assistant"

###########################################

esphome:
  name: "${device_name}"
  friendly_name: ${friendly_name}
  min_version: 2023.10.1
  on_boot:
    then:
      - output.turn_on: dac_mute

esp32:
  board: esp-wrover-kit
  framework:
    type: arduino

packages:
  base: !include common/base_nomq.yaml

logger:
#  level: VERBOSE

i2c:
  sda: GPIO18
  scl: GPIO23

improv_serial:

external_components:
  - source:
      type: local
      path: custom_components

i2s_audio:
  - i2s_lrclk_pin: GPIO25
    i2s_bclk_pin: GPIO5

output:
  - platform: gpio
    id: dac_mute
    pin: GPIO21
    inverted: true

media_player:
  - platform: i2s_audio
    id: luxe_out
    name: ${friendly_name}
    dac_type: external
    i2s_dout_pin: GPIO26
    mode: stereo
    on_state:
      if:
        condition:
          media_player.is_playing:
        then:
          output.turn_off: dac_mute
        else:
          output.turn_on: dac_mute

microphone:
  - platform: i2s_audio
    id: mic_i2s
    i2s_din_pin: GPIO35
    adc_type: external
    pdm: false

#####################################

globals:
  - id: wifi_connected
    type: bool
    initial_value: "false"
    restore_value: false

interval:
  - interval: 1s
    then:
      - if:
          condition:
            and:
              - lambda: "return !id(wifi_connected);"
              - wifi.connected:
          then:
            - globals.set:
                id: wifi_connected
                value: "true"
            - light.turn_on:
                id: led
                effect: pulse
                red: 0%
                green: 100%
                blue: 0%
            - delay: 1s
            - light.turn_off: led

voice_assistant:
  id: va
  microphone: mic_i2s
  media_player: luxe_out
  use_wake_word: true

  on_listening:
    then:
    - light.turn_on:
        id: led
        blue: 1.0
        red: 0.0
        green: 0.0
        state: true
        effect: pulse
  on_tts_start:
    then:
    - light.turn_on:
        id: led
        blue: 0.0
        red: 0.0
        green: 1.0
        state: true
        effect: none
  on_tts_end:
    then:
    - media_player.play_media: !lambda return x;
    - light.turn_on:
        id: led
        blue: 0.0
        red: 0.0
        green: 1.0
        state: true
        effect: pulse

  on_end:
    - delay: 100ms
    - wait_until:
        not:
          media_player.is_playing: luxe_out
    - script.execute: reset_led
    
  on_error:
    then:
    - light.turn_on:
        id: led
        blue: 0.0
        red: 1.0
        green: 0.0
        state: true
        effect: none
    - delay: 1s
    - script.execute: reset_led
    - script.wait: reset_led
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }

es8388:

binary_sensor:
  - platform: gpio
    pin:
      number: GPIO19
      inverted: true
      mode:
        input: true
        pullup: true
    name: ${friendly_name} Volume Up
    on_click:
      - media_player.volume_up: luxe_out
  - platform: gpio
    pin:
      number: GPIO32
      inverted: true
      mode:
        input: true
        pullup: true
    name: ${friendly_name} Volume Down
    on_click:
      - media_player.volume_down: luxe_out
  - platform: gpio
    pin:
      number: GPIO12
      inverted: true
      mode:
        input: true
        pullup: true
    name: ${friendly_name} Play Button
    on_click:
      - if:
          condition:
            switch.is_off: use_wake_word
          then:
            - if:
                condition: voice_assistant.is_running
                then:
                  - voice_assistant.stop:
                  - script.execute: reset_led
                else:
                  - voice_assistant.start:
          else:
            - voice_assistant.stop
            - delay: 1s
            - script.execute: reset_led
            - script.wait: reset_led
            - voice_assistant.start_continuous:
    # on_press:
    #   - voice_assistant.start:
    # on_release:
    #   - voice_assistant.stop:

light:
  - platform: esp32_rmt_led_strip
    id: led
    name: ${friendly_name}
    pin: GPIO22
    chipset: SK6812
    num_leds: 1
    rgb_order: grb
    rmt_channel: 0
    default_transition_length: 0s
    restore_mode: ALWAYS_OFF
    gamma_correct: 2.8
    entity_category: config
    effects:
      - pulse:
          name: pulse
          transition_length: 250ms
          update_interval: 250ms
      - pulse:
          name: slow_pulse
          transition_length: 1s
          update_interval: 2s

sensor:
  - platform: adc
    pin: GPIO33
    id: batt_voltage
    name: Battery Voltage
    icon: "mdi:battery-outline"
    device_class: voltage
    state_class: measurement
    unit_of_measurement: V
    update_interval: 15s
    accuracy_decimals: 3
    attenuation: 11db
    raw: true
    filters:
      - multiply: 0.00173913 # 2300 -> 4, for attenuation 11db, based on Olivier's code
      - exponential_moving_average:
          alpha: 0.2
          send_every: 2
      - delta: 0.002

#Convert the Voltage to a battery  level (%)
  - platform: copy
    source_id: batt_voltage
    id: batt_level
    name: Battery Level
    device_class: battery
    unit_of_measurement: '%'
    accuracy_decimals: 0
    filters:
      - calibrate_linear:
      # Map from voltage to Battery level
          - 2.75 -> 0
          - 3.95 -> 100
      #Handle/cap boundaries
      - lambda: |
          if (x < 0) return 0; 
          else if (x > 100) return 100;
          else return (x);
      - delta: 0.5 #Only send values to HA if they change 
      - throttle: 30s #Limit values sent to Ha
  - platform: template
    id: esp_memory
    icon: mdi:memory
    name: Free Memory
    lambda: return heap_caps_get_free_size(MALLOC_CAP_INTERNAL) / 1024;
    unit_of_measurement: 'kB'
    state_class: measurement
    entity_category: "diagnostic"
  - platform: template
    id: sys_esp_temperature
    name: Internal Temperature
    lambda: return temperatureRead();
    unit_of_measurement: °C
    device_class: TEMPERATURE
    entity_category: "diagnostic"
  - platform: uptime
    name: Uptime
    id: sys_uptime
  - platform: wifi_signal 
    name: RSSI
    id: wifi_signal_db
    entity_category: "diagnostic"

script:
  - id: reset_led
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - light.turn_on:
                id: led
                blue: 100%
                red: 100%
                green: 0%
                brightness: 100%
                effect: none
          else:
            - light.turn_off: led

switch:
  - platform: template
    name: Use Wake Word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
      - script.execute: reset_led
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);
      - script.execute: reset_led

I wonder what is their secret ingredient :slight_smile:

Thanks for your code, going to try this. As long as it works I’m happy, even if it’s not supposed to work lol

Add media player to echo voice assistant · Issue #77 · esphome/firmware · GitHub indicates that first a media player needs to be realized for esp-idf.

+1 on the need for media_player with esp-idf!

In the meantime, I did something similar to koying (workaround n 'round…), but I bypass the voice_assistant>media_player config entirely. I use an HA script to process the text-to-be-spoken (called in on_tts_start) which outputs it back to the ESPHome media_player via tts.speak; very convenient to add a bit of “flair” to every response anyway (and “simple” TTS queueing support)!

The voice assistant pipeline is still wonky with custom device builds (periodic errors, state freezes… don’t know if it’s better with the “featured devices”…), but other than that, this workaround works very well :sweat_smile:

See github link.

2 Likes

Finally I got the voice assistant with media_player running for esp-idf. I needed to implement a custom component for this, you can find it here: github. The component still needs some improvements but I would appreciate if someone is willing to test it on a different hardware than mine. Right now it is running on a ESP32-S3-DevKitC-1. Any feedback is welcome on this.

Same issue here…
I made up my Voice Sattelite using the template provided during an howto instruction video.
This template luckily described why ESP-IDF is really needed:

“This is important. ESPHome supports two frameworks: Arduino and ESP-IDF. ESP-IDF is needed to include an audio library called ESP_ADF used in our voice assistant”

media_player: is part of the Arduino framework which makes it impossible to have our voice sattelites be like the Smartspeakers from Google Home/Amazon Alexa/or Apple Homekit.
Therefore I’m still unable to replace my Google sattelites around my home.
To accomplish the needs of I guess every household is the Sattelites be able to:

  • Be a voice satellite (this is already accomplished)
  • Be able to play audio (therefore media_player: to ESP-IDF is needed)
  • Be able to cast audio to these Sattelites (without a hassle)

Looking forward what the future will bring but as soon as this is all possible HA/ESPHOME will be a valuable competitor against those 3 and will be probably the only really secure secure smartspeaker…oh gosh I’m looking forward to it :slight_smile:

What could be next: HA Smartphones without spy-stuff on it which really integrate with the HA itself? :slight_smile:

2 Likes

Just tried to use this and got an error about using GPIO23 even though I’m sure I’ve changed all references to that pin to my relevant pins :confused:

You probably need to add CONFIG_AUDIO_BOARD_CUSTOM: "y" to sdkconfig_options

Check this, it was working for me: Respeaker-Lite-ESPHome-integration/old_configs/s3-inmp-max-gnumpi-satellite-example.yaml at 3d2f28fda7a3df7e8499300b08851beab2bd8733 · formatBCE/Respeaker-Lite-ESPHome-integration · GitHub

Pay attention, that currently that Misha’s project is a bit obsolete, as new Nabu components are in development.