Play sound when Wakeword detected

Jones97 · December 11, 2023, 11:58am

Hey Everyone,

I want to make my Atom-Echo play a short sound when the wakeword is detected. I tryed using the “on_wake_word_detected” and then use “speaker.play” but the output is always after everything is over and not as i want it, after wakeword is detected.
Here my configuration:

substitutions:
  name: m5stack-atom-echo-0f9924
  friendly_name: Atom Echo

esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}
  project:
    name: m5stack.atom-echo-voice-assistant
    version: "1.0"
  min_version: 2023.11.1
api:
  encryption:
    key: 

esp32:
  board: m5stack-atom
  framework:
    type: esp-idf

logger:
  level: DEBUG
ota:

dashboard_import:
  package_import_url: github://esphome/firmware/voice-assistant/m5stack-atom-echo.yaml@main

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  on_connect:
    - delay: 5s # Gives time for improv results to be transmitted
    - ble.disable:
  on_disconnect:
    - ble.enable:
  ap:

improv_serial:

esp32_improv:
  authorizer: none

button:
  - platform: factory_reset
    id: factory_reset_btn
    name: Factory reset

i2s_audio:
  i2s_lrclk_pin: GPIO33
  i2s_bclk_pin: GPIO19

microphone:
  - platform: i2s_audio
    id: echo_microphone
    i2s_din_pin: GPIO23
    adc_type: external
    pdm: true

speaker:
  - platform: i2s_audio
    id: echo_speaker
    i2s_dout_pin: GPIO22
    dac_type: external
    mode: mono

voice_assistant:
  id: va
  microphone: echo_microphone
  speaker: echo_speaker
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  vad_threshold: 3
  on_wake_word_detected:
    - script.execute: play_bing
  on_listening:        
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        effect: "Slow Pulse"
  on_stt_vad_end:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        effect: "Fast Pulse"
  on_tts_start:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: none
  on_end:
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led
        red: 100%
        green: 0%
        blue: 0%
        brightness: 100%
        effect: none
    - delay: 1s
    - script.execute: reset_led
  on_client_connected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.start_continuous:
          - script.execute: reset_led
  on_client_disconnected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.stop:
          - light.turn_off: led

binary_sensor:
  - platform: gpio
    pin:
      number: GPIO39
      inverted: true
    name: Button
    disabled_by_default: true
    entity_category: diagnostic
    id: echo_button
    on_multi_click:
      - timing:
          - ON for at least 250ms
          - OFF for at least 50ms
        then:
          - if:
              condition:
                switch.is_off: use_wake_word
              then:
                - if:
                    condition: voice_assistant.is_running
                    then:
                      - voice_assistant.stop:
                      - script.execute: reset_led
                    else:
                      - voice_assistant.start:
              else:
                - voice_assistant.stop
                - delay: 1s
                - script.execute: reset_led
                - script.wait: reset_led
                - voice_assistant.start_continuous:
      - timing:
          - ON for at least 10s
        then:
          - button.press: factory_reset_btn

light:
  - platform: esp32_rmt_led_strip
    id: led
    name: None
    disabled_by_default: true
    entity_category: config
    pin: GPIO27
    default_transition_length: 0s
    chipset: SK6812
    num_leds: 1
    rgb_order: grb
    rmt_channel: 0
    effects:
      - pulse:
          name: "Slow Pulse"
          transition_length: 250ms
          update_interval: 250ms
          min_brightness: 50%
          max_brightness: 100%
      - pulse:
          name: "Fast Pulse"
          transition_length: 100ms
          update_interval: 100ms
          min_brightness: 50%
          max_brightness: 100%

script:
  - id: reset_led
    then:
      - if:
          condition:
            - switch.is_on: use_wake_word
            - switch.is_on: use_listen_light
          then:
            - light.turn_on:
                id: led
                red: 100%
                green: 89%
                blue: 71%
                brightness: 60%
                effect: none
          else:
            - light.turn_off: led
  - id: play_bing
    then:
      - if:
          condition:
            - speaker.is_playing: echo_speaker
          then:
            - speaker.play:
                data: [sound_data]
          else:
            - logger.log: "Speaker Bussy"

switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
      - script.execute: reset_led
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);
      - script.execute: reset_led
  - platform: template
    name: Use Listen Light
    id: use_listen_light
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - script.execute: reset_led
    on_turn_off:
      - script.execute: reset_led

external_components:
  - source: github://pr#5230
    components:
      - esp_adf
    refresh: 0s

esp_adf:

danleongjy · December 23, 2023, 2:03am

I found the same behaviour too when trying to play a ping after the wake word is detected. A workaround is to move - script.execute: play_bing to just after on_listening: instead of using on_wake_word_detected:.

bazinga · December 27, 2023, 10:20pm

How I do it…enable in homeassistant esp_home m5entity configuration this one “Allow the device to make Home Assistant service calls” - then, I put in the esphome config the “on_listening” and use an homeassistant service call - in my case I use mqtt as all my voice commands and notifications run via an mqtt server. But you could use tts command from homeassistant here. It then simply works like this “hey jarvis”… “yeeeees?” “do this and that”

voice_assistant:
  id: va
  microphone: echo_microphone
  speaker: echo_speaker
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  vad_threshold: 3
  on_listening:
    - homeassistant.service:
        service: mqtt.publish
        data:
              topic: notifications
        data_template:
              payload_template: |
                { 
                  "heading": "An event has started",
                  "details":
                    { 
                      "msg": "yes?",
                      "player": "${location}",
                      "importance": "med",
                      "received": "{{ now().strftime("%H:%M:%S") }}" 
                    }
                }
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        effect: "Slow Pulse"

ha_frw · January 1, 2024, 2:42pm

excellent idea, but I’m unable to find information what to put in raw data, can you share what to use or the raw data you used?

Thank you

ha_frw · January 1, 2024, 2:59pm

that does also not work as well for me, can’t get a click/ping to work on my echo

SKAL · January 1, 2024, 7:44pm

I’m a rookie, but why not use a “parallel” function?

                         | do the sound
wake - paralell function |
                         | do the action you want...

Could it work?

ha_frw · January 3, 2024, 10:35am

you have to stop the voice assistant, as it occupies the speaker, I guess you cannot play a click without changing the esphome voice_assistant code: https://github.com/esphome/esphome/blob/a2e152ad1252444a9d12e3a129270f62076d4c12/esphome/components/voice_assistant/voice_assistant.h

smoldersonline · February 3, 2024, 11:45am

Thanks a lot for sharing! I’m a new to HA’s Voice Assist functionality. I’m working with an onju-voice (nest mini) device, and that works just fine. For wakework detection&notification I’m trying with this in ESPHome configuration of the onju-voice device:

  on_listening:
    - homeassistant.service:
        service: tts.speak
        target:
          entity_id: tts.google_en_com
        data: 
          media_player_entity_id: onju_out
          message: "Yes?"         
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        effect: "Slow Pulse"

Unfortunately this doesn’t work.

I would be very grateful for your kind support!

zolaktt · February 9, 2024, 11:49am

Has anyone managed to get this working, to play a sound when the wake word is detected. Any sound TTS or wav?

In the MQTT solution posted here, I don’t really get how do you play a sound on the Echo. Fine you will send a MQTT message… but then what… the Echo isn’t exposed as a media player.

I wanted to give it a try with speaker.play, but I don’t get what to put into the data field. How to convert a wav file to whatever format is needed here? I can’t find any docs

cnose · February 9, 2024, 7:38pm

Hey,

I’m getting close to figuring this out. Instead of using ‘on_listening’ use ‘on_wake_word_detected’. I’ll just post what I have here (I’m using piper though):

on_wake_word_detected:
    - switch.turn_off: use_wake_word
    - delay: !lambda "if (id(use_wake_word).state) return 200; else return 0;"
    - homeassistant.service:
        service: tts.speak
        data:
          cache: "false"
          media_player_entity_id: media_player.onju_1_onju_voice_1
          message: What do you want?
          entity_id: tts.piper
    - wait_until:
        not:
          media_player.is_playing: onju_out
    - delay: 100ms
    - voice_assistant.start_continuous

I don’t like how the LEDs behave during this part, kind of glitchy.

on_end:
  - delay: 100ms
      - wait_until:
          not:
            media_player.is_playing: onju_out
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - script.execute: reset_led
          else:
            - delay: 100ms
            - switch.turn_on: use_wake_word

Test it out for yourself and see if you can improve it!

Edit: I should have said that there is something wrong the stuff I modified for ‘on_end’ - it does restart wakeword detection, however it doesn’t actually detect the wakeword until I toggle the switch in HA. So it needs work.

Edit2: see updated code above. Might have cracked it! Not sure if this would work on other ESP32 devices running voice assistant, but please give it a try.

smoldersonline · February 10, 2024, 12:02pm

Wow - this works (onju voice here)!

Thanks a lot.

smoldersonline · February 12, 2024, 10:44am

Again many thanks for sharing this!

I am already very happy with how it works right now. There may be one small improvement for me. After the wakeword confirmation (“What do you want?”), there is a slight delay until the device will listen. I have tried removing (one-by-one) the delays as included in your “on_wake_word_detected” configuration, but that does not seem to work.

Again, not at all a big thing.

Elf · February 15, 2024, 9:46pm

In my case it almost works with ESP32 speaker, but with bug.
First it says “not recognized” and then the phrase “room light on” (and it does switch light).
Looks like I’ve messed with the configuration, I’d appreciate it if you could point out what’s wrong with my program.

voice_assistant:
  microphone: mic_i2s
  id: va
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 4.0
  use_wake_word: false
  media_player: media_player_speaker
  
  on_wake_word_detected: 
    - light.turn_on:
        id: led_light
        effect: "1-Green"
        brightness: 100%
    - switch.turn_off: use_wake_word
    - delay: !lambda "if (id(use_wake_word).state) return 200; else return 0;"
    - homeassistant.service:
        service: tts.speak
        data:
          # cache: "false"
          media_player_entity_id: media_player.speaker_22_esp_speaker_1
          message: Say!
          entity_id: tts.piper
    - wait_until:
        not:
          media_player.is_playing: media_player_speaker
    - delay: 10ms
    - voice_assistant.start_continuous

  on_listening: 
    - light.turn_on:
        id: led_light
        effect: "Scan Effect"
        red: 63%
        green: 13%
        blue: 93%
        brightness: 50%

  on_stt_vad_start:
    - light.turn_on:
        id: led_light
        effect: "Scan Effect"
        red: 0%
        green: 100%
        blue: 50%
        brightness: 50%

  on_stt_end:
    - light.turn_on:
        id: led_light
        effect: "None"
        red: 0%
        green: 100%
        blue: 0%
        brightness: 50%

  on_error: 
    - light.turn_on:
        id: led_light
        effect: "1-red"
        brightness: 50%
    - if:
        condition:
          switch.is_on: use_wake_word
        then:

          - switch.turn_off: use_wake_word
          - delay: 1sec 
          - switch.turn_on: use_wake_word      
  


  on_client_connected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.start_continuous:

  on_client_disconnected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.stop:
 
  on_end:

    - delay: 100ms
    - wait_until:
        not:
          media_player.is_playing: media_player_speaker
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          light.turn_on:
            id: led_light
            effect: "1-Blue"
            brightness: 100%
        else:
          - delay: 100ms
          - switch.turn_on: use_wake_word


binary_sensor:
  - platform: status
    name: API Connection
    id: api_connection
    filters:
      - delayed_on: 1s
    on_press:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - voice_assistant.start_continuous:
    on_release:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - voice_assistant.stop:
  - platform: status
    name: "Status"

switch:
  - platform: template
    name: "Use wake word"
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config

    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);
    
  - platform: restart
    name: "Speaker 2 Restart"

It’s like there’s a loop and it’s listening for something unnecessary.

p.s. I removed the caching false line, as this is a standard phrase that will be repeated every time and it works faster. This can’t be the cause of the failure, can it?

Demusman · March 28, 2024, 10:58pm

Did anyone get this working with an Atom Echo?

brosef · May 8, 2024, 1:52am

Did anyone get this working? I feel like an audio cue when your wake word is detected is a really important option to have.

umglurf · June 22, 2024, 3:41pm

I have been able to get a pretty good solution based on the solution in the post from cnose, using a small sound clip instead of tts.

First you will need a audio clip, I found a collection. Download the sound clip and to convert it by running:

ffmpeg -i INPUT.mp3 -ac 1 -acodec pcm_u8 -ar 16000 wakeword-detected.wav
sox wakeword-detected.wav --bits 8 --encoding signed-integer --endian little wakeword-detected.raw

The config was updated with (just the updated/added sections is shown):

external_components:
  - source: github://jesserockz/esphome-components
    components: [file]

file:
  - id: ok_sound
    file: wakeword-detected.raw

voice_assistant:
  on_wake_word_detected:
    - switch.turn_on: play_wakeword_sound
    - wait_until:
        not:
          - voice_assistant.is_running
    - lambda: id(echo_speaker).play(id(ok_sound), sizeof(id(ok_sound)));
    - wait_until:
        not:
          speaker.is_playing:
    - voice_assistant.start_continuous
  on_stt_end:
    - if:
        condition:
          switch.is_on: play_wakeword_sound
        then:
          - switch.turn_off: play_wakeword_sound

switch:
  - platform: template
    name: Play Wakeword sound
    id: play_wakeword_sound
    optimistic: true
    internal: true
    restore_mode: RESTORE_DEFAULT_OFF
    on_turn_on:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);
    on_turn_off:
      - delay: 5s
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous

Edit 1: It seems to work a bit better by adding a delay in play_wakeword_sound.on_turn_off
Edit 2: The file module was updated and argument changed from path to file

r6zifkgl · June 28, 2024, 8:43pm

maybe stupid question but I assume that

path: wakeword-detected.raw

should be on Atom Echo device
how do I stick it in there?

umglurf · June 29, 2024, 4:50am

It is a path on the computer you run esphome compile, if you have it in the same place as the yaml config file it should work as shown. The esphome compilation will read in the file and embed it in the firmware that is then uploaded to the device.

spectralMachina · July 1, 2024, 6:36pm

I was able to get a sound to play from the Atom Echo using your code and instructions, so thank you so much for providing those!

However, after the assist pipeline has finished, and tries to send back a voice reply, the Echo’s log fills up with “speaker buffer is full” (or something to that effect) and the voice never plays. The wake sound I’m using is about a second long and about 15kb in size when converted to .raw. Any idea on how to resolve this?

umglurf · July 1, 2024, 6:57pm

Hi, it could be the file size. My sound is 8k, so you could try reducing it and see if that helps. You could also try to add a short delay between the speaker.play and wait_until and before voice_assistant.start_continuous and see if that makes a difference.