ESPHome Voice Assistant speech output to Home Assistant Media Player

using “wrong config” only works temporarily, after a short while the i2s buffer runs out and breaks the pipeline until it restarts. i’m sort of lucky i also have a fried echo, or i thought it was, but it’s only the speaker that is dead so that fixed that problem for me, yet we do need that option to define the output device, not everyone have half-fried echo’s…

also thought of setting volume of speaker to 0, but thats not a option for speaker, sadly i think we just have to wait and hope they start thinking OUT of the assistant box and realize input and output doesn’t have to be the same device…

has anyone still got this working?

if so can you please post your config as i’m having no luck at all.
my HomePod pauses the current playing media but nothing plays.

Yes - I just got it working using the on_tts_start example in this thread although I’m using an Amazon echo rather than a HomePod. For an Echo, one must set the public URL for accessing HA in the Alexa MediaPlayer integration. Does HomePod have something similar ?

Here is my ESPHome yaml for an NodeMCU-ESP32S board - note that this is NOT original but code for the Atom M5 box + the code in this thread. It does still need a few tweaks - this still uses the attached speaker. If I exclude the speaker from the voice_assistant declaration, the board reboots before sending the text to the Echo. I also thing the esp-adf PR5230 is invalid now but somehow the latest esp-adf code was downloaded to my system and it builds ok. I don’t really know how that happened… At the end of the day, this whole pipeline needs some more formal treatment by people that know better.

esphome:
  name: nodemcu-esp-32s
  friendly_name: NodeMCU ESP-32S

esp32:
  board: esp32dev
  framework:
    type: esp-idf
    version: recommended

logger:
#  level: VERBOSE

# Enable Home Assistant API
api:
  encryption:
    key: <key>

ota:
  password: <key>

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Nodemcu-Esp-32S Fallback Hotspot"
    password: "T18wj5zTcPbM"

captive_portal:

web_server:
  port: 80

button:
  - platform: factory_reset
    id: factory_reset_btn
    name: Factory reset    

light:
  - platform: status_led
    id: gpio2_light
    name: "Status led"
    pin: GPIO2

  - platform: esp32_rmt_led_strip
    id: led
    name: None
    disabled_by_default: true
    entity_category: config
    pin: GPIO22
    default_transition_length: 0s
    chipset: WS2812
    num_leds: 1
    rgb_order: grb
    rmt_channel: 0
    effects:
      - pulse:
          name: "Slow Pulse"
          transition_length: 250ms
          update_interval: 250ms
          min_brightness: 50%
          max_brightness: 100%
      - pulse:
          name: "Fast Pulse"
          transition_length: 100ms
          update_interval: 100ms
          min_brightness: 50%
          max_brightness: 100%

i2s_audio:
  - id: i2s_out
    i2s_lrclk_pin: GPIO26
    i2s_bclk_pin: GPIO27
  - id: i2s_in
    i2s_lrclk_pin: GPIO19
    i2s_bclk_pin: GPIO18

speaker:
  - platform: i2s_audio
    id: echo_speaker
    dac_type: external
    i2s_audio_id: i2s_out
    i2s_dout_pin: GPIO14
    mode: mono

microphone:
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: echo_microphone
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO23

voice_assistant:
  id: va
  microphone: echo_microphone
  speaker: echo_speaker
  noise_suppression_level: 2
  auto_gain: 31dBFS
  # volume_multiplier: 2.0
  vad_threshold: 3
  on_listening:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        effect: "Slow Pulse"
  on_stt_vad_end:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        effect: "Fast Pulse"
  on_tts_start:
    - light.turn_on:
        id: led
        blue: 50%
        red: 0%
        green: 50%
        brightness: 100%
        effect: none
    - homeassistant.service:
        service: tts.speak
        data:
          media_player_entity_id: media_player.office_black    #replace this with your media player entity id
          message: !lambda 'return x;'
          entity_id: tts.piper                 #replace this with your piper tts entity id.

  on_end:
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
  # on_tts_end:
  #   - homeassistant.service:
  #       service: media_player.play_media
  #       data:
  #         entity_id: media_player.office_black
  #         media_content_id: !lambda 'return x;'
  #         media_content_type: music
  #         announce: "true"
  #   - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led
        red: 100%
        green: 0%
        blue: 0%
        brightness: 100%
        effect: none
    - delay: 1s
    - script.execute: reset_led
  on_client_connected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.start_continuous:
          - script.execute: reset_led
  on_client_disconnected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.stop:
          - light.turn_off: led

binary_sensor:
  - platform: gpio
    pin:
      number: GPIO0
      inverted: true
    name: Button
    disabled_by_default: true
    entity_category: diagnostic
    id: echo_button
    on_multi_click:
      - timing:
          - ON for at least 250ms
          - OFF for at least 50ms
        then:
          - if:
              condition:
                switch.is_off: use_wake_word
              then:
                - if:
                    condition: voice_assistant.is_running
                    then:
                      - voice_assistant.stop:
                      - script.execute: reset_led
                    else:
                      - voice_assistant.start:
              else:
                - voice_assistant.stop
                - delay: 1s
                - script.execute: reset_led
                - script.wait: reset_led
                - voice_assistant.start_continuous:
      - timing:
          - ON for at least 10s
        then:
          - button.press: factory_reset_btn

switch:
  - platform: restart
    name: "Restart"
  
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
      - script.execute: reset_led
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);
      - script.execute: reset_led
  - platform: template
    name: Use listen light
    id: use_listen_light
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - script.execute: reset_led
    on_turn_off:
      - script.execute: reset_led        

script:
  - id: reset_led
    then:
      - if:
          condition:
            - switch.is_on: use_wake_word
            - switch.is_on: use_listen_light
          then:
            - light.turn_on:
                id: led
                red: 100%
                green: 89%
                blue: 71%
                brightness: 60%
                effect: none
          else:
            - light.turn_off: led        

external_components:
  - source: github://pr#5230
    components:
      - esp_adf
    refresh: 0s

esp_adf:

I’m new to Home Assistant and ESPHome, and I’m not sure where the yaml goes in my config, or if I need to add other keys.

Does the on_tts_start key need to be nested within another key? If so, how would I know what key that is, and what other keys are required to be nested within that key?

Thanks in advance for any help.

1 Like

Not sure if you got this working but I just add the on_tts_end part, I don’t have the on_tts_start in my config and it works perfectly with my Korvo-1, see below. Don’t forget to enable to configure the device to allow it to make action (service) calls as stated a few posts up…


  on_tts_stream_start:
    - lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
  on_tts_end:
    - homeassistant.service:
        service: media_player.play_media
        data:
          entity_id: media_player..output_speaker
          media_content_id: !lambda 'return x;'
          media_content_type: music
          announce: "true"
  on_tts_stream_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
  on_end:

Hi all! First off, thanks to everyone in this thread, as this was what enabled me to get output to my GHome speakers when setting up my new Atom Echos.

That said, some things have evolved in recent times, specifically with regard to being able to easily disable the onboard speakers of the Atom Echo and S3 Box so the Assist responses only come out of your chosen speaker: the !remove statement. So, in the end, this is my current fully working version of the config edit for the voice_assistant: block, which also includes some tweaks for the Atom Echo’s microphone to allow Assist an easier time understanding you in more situations beyond total silence. With the below noise and vol tweaks, my Atoms can still hear me clearly from an entire room away most of the time:

voice_assistant:
  noise_suppression_level: 4     # increase noise suppression to 3 -or- 4 from default 2 for better sound floor suppression
  volume_multiplier: 5.0     # increase multiplier from 2.0 to 5.0 to give the mic a little boost...going above 5.0 with Atom Echo resulted in distorted audio for me
  speaker: !remove     # remove the default 'echo_speaker' entry so VA doesn't use internal speaker at all. NOTE: THIS ALSO DISABLES SOUND FOR TIMERS but LED will still flash on finish
  on_tts_start:     # this gets the TTS pipeline started earlier than 'on_tts_end' and reduces response delay for the user, but might not work for Amazon Echos
    - homeassistant.service:
        service: tts.cloud_say
        data:
          entity_id: media_player.my_speaker_2
          message: !lambda 'return x;'

Instead of rewriting the entire on_timer_finished: block to push the timer audio to the media_player, I’ll probably just expose the timer_ringing switch to HA and automate off of that, as I would like to be able to cancel the timers from an HA notification on my phone, as well, instead of just the button on the Atom. This isn’t a priority for me right now, as I don’t use timers very often. I’m just glad to be able to hear my Atom’s responses now without the faint crackly echo of the, well, Echo. haha

2 Likes

Hey guys!! I’m new to the forum, I found a solution trying to solve the problem of the sound only coming out on my Google speaker, I commented a line on the speaker, and it worked without error.

image

I tried nearly all variations here, but all I get as output on my amazon echo device is that it tells me something about an https url. But not the real response.
Any Idea?
Config looks like that:

  on_tts_start:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: none
    - homeassistant.service:
        service: tts.cloud_say
        data:
          entity_id: media_player.echo_wohnzimmer_2    #replace this with your media player entity id
          message: !lambda 'return x;'

I finally got around to exposing timer_ringing (indirectly via new switch) so I could automate off of it. Now, via automations, I get a TTS announcement and a notification on my phone when a timer is finished.

Here is my current complete set of customizations for anyone wondering:

voice_assistant:
  # Adjust Mic parameters for better understanding depending on room environment
  noise_suppression_level: 2   # 1-4
  volume_multiplier: 5.0   #1.0-6.0 (higher than 6 will result in major distortion)
  # Don't use Atom's speaker at all
  speaker: !remove     
  # Output response as a TTS to a chosen speaker
  on_tts_start:     
    - homeassistant.service:
        service: tts.cloud_say
        data:
          entity_id: media_player.den_speaker_2
          message: !lambda 'return x;'
  # I want to know when an Atom loses connection to HA, so blink the light fast red
  on_client_disconnected:     
    then:
    - voice_assistant.stop: {}
    - micro_wake_word.stop: {}
    - light.turn_on:
        id: led
        red: 1.0
        green: 0.0
        blue: 0.0
        brightness: 1.0
        effect: Fast Pulse
        state: true

# Expose a restart button to HA so the Atom can be remotely rebooted 
# (can fix a stuck pipeline or unstable wifi connection in multi-AP/mesh environments)  
button:     
- platform: restart
  id: restart_btn
  name: Reboot
  disabled_by_default: false
  icon: mdi:restart-alert
  entity_category: config
  device_class: restart  

# Expose a new switch to HA to indicate timer_ringing AND ability to 
# toggle it back to an off state (acknowledges the timer, same as pressing 
# Atom's front button); automate using this switch
switch:     
- platform: template
  name: Timer Ringing
  optimistic: true
  lambda: |-
    if (id(timer_ringing).state) {
      return true;
    } else {
      return false;
    }
  turn_off_action:
      - switch.turn_off: timer_ringing

I believe Alexa devices work differently because the audio has to be sent to Amazon before being sent back down to the Echo (don’t get me started on Echo devices and Amazon’s data hoovering; suffice it to say don’t send anything you don’t want Amazon to aggressively use against you).

Config for atom echo with tts sent to separate media player.

Uses tts.cloud_say so will require home assistant cloud

Automatically sets the volume to 20% for voice responses and returns to previous volume state after response.

voice_assistant:
  id: va
  microphone: echo_microphone
  speaker: echo_speaker
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  vad_threshold: 3
  on_listening:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: pulse
  on_tts_start:
    - light.turn_on:
        id: led
        blue: 0%
        red: 0%
        green: 100%
        brightness: 100%
        effect: pulse
    - homeassistant.service:
        service: media_player.volume_set
        data:
          entity_id: media_player.living_room_speaker  # Replace with your media player entity_id
          volume_level: 0.2  # Set volume to 20%
    - homeassistant.service:
        service: tts.cloud_say
        data:
          entity_id: media_player.living_room_speaker
        data_template:
          message: "{{ tts_message }}"
        variables:
          tts_message: return x;
  on_end:
    - light.turn_off: led
    - homeassistant.service:
        service: media_player.volume_set
        data_template:
          entity_id: media_player.living_room_speaker
          volume_level: "{{ states('media_player.living_room_speaker.attributes.volume_level') }}"
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led
        blue: 0%
        red: 100%
        green: 0%

Here’s a config that uses piper instead of tts.cloud_say for local tts
I run piper and faster-whisper via docker on a separate sever. You can simply use the piper addon if your hardware is decent or run as a container for very fast responses. As of 2024.12 my response times are a second or less.

voice_assistant:
  id: va
  microphone: echo_microphone
  speaker: echo_speaker
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  vad_threshold: 3
  on_listening:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: pulse
  on_tts_start:
    - light.turn_on:
        id: led
        blue: 0%
        red: 0%
        green: 100%
        brightness: 100%
        effect: pulse
    - homeassistant.service:
        service: media_player.volume_set
        data:
          entity_id: media_player.living_room_speaker  # Replace with your media player entity_id
          volume_level: 0.2  # Lower volume to 20%
    - homeassistant.service:
        service: tts.piper_say
        data:
          entity_id: media_player.living_room_speaker  # Replace with your media player entity_id
          message: "{{ tts_message }}"
        variables:
          tts_message: return x;
  on_end:
    - light.turn_off: led
    - homeassistant.service:
        service: media_player.volume_set
        data_template:
          entity_id: media_player.living_room_speaker
          volume_level: "{{ states('media_player.living_room_speaker.attributes.volume_level') }}"
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led
        blue: 0%
        red: 100%
        green: 0%

For me, using this config the wake word works, but then it just gets stuck in “Assist satellite → Processing”, no errors in the logs. Before adding this to the config, the ATOM Echo was working fine, it only had the bad speaker sound.

I got: “homeassistant.exceptions.ServiceNotFound: Action tts.piper_say not found”
And: ‘M5Stack Atom Echo 23a170’ - No such effect ‘pulse’

I tried using:

     - homeassistant.action:
        service: tts.speak
        data:
          entity_id: tts.piper
          media_player_entity_id: media_player.vlc_telnet
          message: "{{ tts_message }}"
        variables:
          tts_message: return x;

The speaker works, but it just reads out loud “tts_message” literary, not the answer from the LLM. Any idea hot to fix it?

Edit:
Seems like using it like this works perfectly for me:

    - homeassistant.action:
        service: tts.speak
        data:
          entity_id: tts.piper
          media_player_entity_id: media_player.vlc_telnet
          message: !lambda 'return x;'

thanks for posting. Where does this go though? in configuration.yaml or somewhere else? And the id, microphone, speaker names - where are these defined?
Thanks a lot in advance!

In the ESP device config file (ESPhome)

Thanks. I did the following -

  1. Installed ESPhome on docher (my HA instance also runs in docker, so no “addons” available)
  2. Configured the Atom Echo with HA firmware
  3. tried tinkering with the config file on ESPhome as shown above
substitutions:
  name: m5stack-atom-echo-0ab278
  friendly_name: M5Stack Atom Echo 0ab278
esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}
api:
  encryption:
    key:xxx


voice_assistant:
  # Adjust Mic parameters for better understanding depending on room environment
  noise_suppression_level: 2   # 1-4
  volume_multiplier: 5.0   #1.0-6.0 (higher than 6 will result in major distortion)
  # Don't use Atom's speaker at all
  speaker: !remove     
  # Output response as a TTS to a chosen speaker
  on_tts_start:     
    - homeassistant.service:
        service: lms_tts_notify #this is what I otherwise use, works file via HA
        data:
          entity_id: media_player.black_2
          message: !lambda 'return x;'
  # I want to know when an Atom loses connection to HA, so blink the light fast red
  on_client_disconnected:     
    then:
    - voice_assistant.stop: {}
    - micro_wake_word.stop: {}
    - light.turn_on:
        id: led
        red: 1.0
        green: 0.0
        blue: 0.0
        brightness: 1.0
        effect: Fast Pulse
        state: true

# Expose a restart button to HA so the Atom can be remotely rebooted 
# (can fix a stuck pipeline or unstable wifi connection in multi-AP/mesh environments)  
button:     
- platform: restart
  id: restart_btn
  name: Reboot
  disabled_by_default: false
  icon: mdi:restart-alert
  entity_category: config
  device_class: restart  

# Expose a new switch to HA to indicate timer_ringing AND ability to 
# toggle it back to an off state (acknowledges the timer, same as pressing 
# Atom's front button); automate using this switch
switch:     
- platform: template
  name: Timer Ringing
  optimistic: true
  lambda: |-
    if (id(timer_ringing).state) {
      return true;
    } else {
      return false;
    }
  turn_off_action:
      - switch.turn_off: timer_ringing


wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

The problem - I see the following logs in the esphome

NFO ESPHome 2024.12.4
INFO Reading configuration /config/m5stack-atom-echo-0ab278.yaml...
Failed config

esphome: [source /config/m5stack-atom-echo-0ab278.yaml:5]
  
  Platform missing. You must include one of the available platform keys: bk72xx, esp32, esp8266, host, libretiny, rp2040, rtl87xx.
  name: m5stack-atom-echo-0ab278
  name_add_mac_suffix: False
  friendly_name: M5Stack Atom Echo 0ab278

So, am I on the correct path? Or what does this error actually mean?

Means you need something like this in your code:

esp32:
  board: m5stack-atom
  framework:
    type: arduino

Thanks for the inputs. I got it working… almost.
only the “!remove”
is not accepted by the validate / compile function. I also have an ATOM. Do you know what the issue might be?