ESPHome Voice Assistant speech output to Home Assistant Media Player

I have a solution - call the Home Assistant tts.cloud_say service from Esphome after the on_tts_start: event. Here is the Esphome yaml code:

on_tts_start:
    - homeassistant.service:
        service: tts.cloud_say
        data:
          entity_id: media_player.your_media_player_id
        data_template:
          message: "{{ my_stt }}"
        variables:
          my_stt: return x;

This sends out the response text to the cloud tts service again, but directs the response mp3 file to a different media player.

2 Likes

unfortunately this does not work for me. When I try it in the developer tools I have to add a target, but I can’t add it to the call in esphome.
Also, “x” contains the URL of the raw audio file / stream, not the response text. So even if it worked, I would expect to head something like “http…”?

It is strange. I am using the same yaml and it works as expected without a single problem.

Here is my voice assistant config portion is esphome, did I put the home assistant media player in the correct spot?

voice_assistant:
  id: va
  microphone: echo_microphone
  speaker: echo_speaker
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  vad_threshold: 3
  on_listening:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: pulse
  on_tts_start:
    - light.turn_on:
        id: led
        blue: 0%
        red: 0%
        green: 100%
        brightness: 100%
        effect: pulse
    - homeassistant.service:
        service: tts.cloud_say
        data:
          entity_id: media_player.announcements
        data_template:
          message: "{{ my_stt }}"
        variables:
          my_stt: return x;
  on_end:
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led
        blue: 0%
        red: 100%
        green: 0%
        brightness: 100%
        effect: none
    - delay: 1s
    - script.execute: reset_led
    - script.wait: reset_led
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }
  on_client_connected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.start_continuous:
          - script.execute: reset_led
  on_client_disconnected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.stop:
          - light.turn_off: led  

The output on the Home Assistant mediaplayer works for me but at the same time the speaker of the Atoim Echo speaks the same line as well.
Is there any way to stop it doing this or at least mute that small speaker? (without desoldering it)

I simply “broke” the configuration of the speaker in my M5Stack Atom Echo, so it does no longer play audio:

speaker:
  - platform: i2s_audio
    id: echo_speaker
#    i2s_dout_pin: GPIO22
#    dac_type: external
#    mode: mono
    dac_type: internal # wrong config to mute speaker
    mode: left # wrong config to mute speaker

Then adding this to my voice_assistant gives me only output on my Sonos speaker

  on_tts_end:
    - homeassistant.service:
        service: media_player.play_media
        data:
          media_content_id: !lambda 'return x;'
          media_content_type: audio/mpeg
          entity_id: media_player.wz_sonos

In an ideal world I would be able to use the build-in mic in the Sonos speaker… but hey, you can’t have everything I guess :wink:

A more comprehensive solution, thanks to Amrit Prabhu of smarthomecircle.com, which shows how to direct the tts output to a local voice assistant:

# For an internal voice assistant, use tts.speak to send to tts.piper
# 
  on_tts_start:                                    # this is required to play the output on a media player
    - homeassistant.service:
        service: tts.speak
        data:
          media_player_entity_id: media_player.my_media_player    #replace this with your media player entity id
          message: !lambda 'return x;'
          entity_id: tts.piper                 #replace this with your piper tts entity id.

#
# For a cloud-based voice assistant, use tts.cloud_say to send to Home Assistant Cloud
#
  on_tts_start:
    # send the tts response on a home assistant media player
    - homeassistant.service:
        service: tts.cloud_say
        data:
          entity_id: media_player.my_media_player   #replace this with your media player entity id
          message: !lambda 'return x;'
4 Likes

Wasn’t working here either. Finally found after days of searching, just needed to grant the esphome device permission to make Home Assistant service calls. You can do this in the device configuration. Hope this helps

1 Like

using “wrong config” only works temporarily, after a short while the i2s buffer runs out and breaks the pipeline until it restarts. i’m sort of lucky i also have a fried echo, or i thought it was, but it’s only the speaker that is dead so that fixed that problem for me, yet we do need that option to define the output device, not everyone have half-fried echo’s…

also thought of setting volume of speaker to 0, but thats not a option for speaker, sadly i think we just have to wait and hope they start thinking OUT of the assistant box and realize input and output doesn’t have to be the same device…

has anyone still got this working?

if so can you please post your config as i’m having no luck at all.
my HomePod pauses the current playing media but nothing plays.

Yes - I just got it working using the on_tts_start example in this thread although I’m using an Amazon echo rather than a HomePod. For an Echo, one must set the public URL for accessing HA in the Alexa MediaPlayer integration. Does HomePod have something similar ?

Here is my ESPHome yaml for an NodeMCU-ESP32S board - note that this is NOT original but code for the Atom M5 box + the code in this thread. It does still need a few tweaks - this still uses the attached speaker. If I exclude the speaker from the voice_assistant declaration, the board reboots before sending the text to the Echo. I also thing the esp-adf PR5230 is invalid now but somehow the latest esp-adf code was downloaded to my system and it builds ok. I don’t really know how that happened… At the end of the day, this whole pipeline needs some more formal treatment by people that know better.

esphome:
  name: nodemcu-esp-32s
  friendly_name: NodeMCU ESP-32S

esp32:
  board: esp32dev
  framework:
    type: esp-idf
    version: recommended

logger:
#  level: VERBOSE

# Enable Home Assistant API
api:
  encryption:
    key: <key>

ota:
  password: <key>

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Nodemcu-Esp-32S Fallback Hotspot"
    password: "T18wj5zTcPbM"

captive_portal:

web_server:
  port: 80

button:
  - platform: factory_reset
    id: factory_reset_btn
    name: Factory reset    

light:
  - platform: status_led
    id: gpio2_light
    name: "Status led"
    pin: GPIO2

  - platform: esp32_rmt_led_strip
    id: led
    name: None
    disabled_by_default: true
    entity_category: config
    pin: GPIO22
    default_transition_length: 0s
    chipset: WS2812
    num_leds: 1
    rgb_order: grb
    rmt_channel: 0
    effects:
      - pulse:
          name: "Slow Pulse"
          transition_length: 250ms
          update_interval: 250ms
          min_brightness: 50%
          max_brightness: 100%
      - pulse:
          name: "Fast Pulse"
          transition_length: 100ms
          update_interval: 100ms
          min_brightness: 50%
          max_brightness: 100%

i2s_audio:
  - id: i2s_out
    i2s_lrclk_pin: GPIO26
    i2s_bclk_pin: GPIO27
  - id: i2s_in
    i2s_lrclk_pin: GPIO19
    i2s_bclk_pin: GPIO18

speaker:
  - platform: i2s_audio
    id: echo_speaker
    dac_type: external
    i2s_audio_id: i2s_out
    i2s_dout_pin: GPIO14
    mode: mono

microphone:
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: echo_microphone
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO23

voice_assistant:
  id: va
  microphone: echo_microphone
  speaker: echo_speaker
  noise_suppression_level: 2
  auto_gain: 31dBFS
  # volume_multiplier: 2.0
  vad_threshold: 3
  on_listening:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        effect: "Slow Pulse"
  on_stt_vad_end:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        effect: "Fast Pulse"
  on_tts_start:
    - light.turn_on:
        id: led
        blue: 50%
        red: 0%
        green: 50%
        brightness: 100%
        effect: none
    - homeassistant.service:
        service: tts.speak
        data:
          media_player_entity_id: media_player.office_black    #replace this with your media player entity id
          message: !lambda 'return x;'
          entity_id: tts.piper                 #replace this with your piper tts entity id.

  on_end:
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
  # on_tts_end:
  #   - homeassistant.service:
  #       service: media_player.play_media
  #       data:
  #         entity_id: media_player.office_black
  #         media_content_id: !lambda 'return x;'
  #         media_content_type: music
  #         announce: "true"
  #   - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led
        red: 100%
        green: 0%
        blue: 0%
        brightness: 100%
        effect: none
    - delay: 1s
    - script.execute: reset_led
  on_client_connected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.start_continuous:
          - script.execute: reset_led
  on_client_disconnected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.stop:
          - light.turn_off: led

binary_sensor:
  - platform: gpio
    pin:
      number: GPIO0
      inverted: true
    name: Button
    disabled_by_default: true
    entity_category: diagnostic
    id: echo_button
    on_multi_click:
      - timing:
          - ON for at least 250ms
          - OFF for at least 50ms
        then:
          - if:
              condition:
                switch.is_off: use_wake_word
              then:
                - if:
                    condition: voice_assistant.is_running
                    then:
                      - voice_assistant.stop:
                      - script.execute: reset_led
                    else:
                      - voice_assistant.start:
              else:
                - voice_assistant.stop
                - delay: 1s
                - script.execute: reset_led
                - script.wait: reset_led
                - voice_assistant.start_continuous:
      - timing:
          - ON for at least 10s
        then:
          - button.press: factory_reset_btn

switch:
  - platform: restart
    name: "Restart"
  
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
      - script.execute: reset_led
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);
      - script.execute: reset_led
  - platform: template
    name: Use listen light
    id: use_listen_light
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - script.execute: reset_led
    on_turn_off:
      - script.execute: reset_led        

script:
  - id: reset_led
    then:
      - if:
          condition:
            - switch.is_on: use_wake_word
            - switch.is_on: use_listen_light
          then:
            - light.turn_on:
                id: led
                red: 100%
                green: 89%
                blue: 71%
                brightness: 60%
                effect: none
          else:
            - light.turn_off: led        

external_components:
  - source: github://pr#5230
    components:
      - esp_adf
    refresh: 0s

esp_adf:

I’m new to Home Assistant and ESPHome, and I’m not sure where the yaml goes in my config, or if I need to add other keys.

Does the on_tts_start key need to be nested within another key? If so, how would I know what key that is, and what other keys are required to be nested within that key?

Thanks in advance for any help.

Not sure if you got this working but I just add the on_tts_end part, I don’t have the on_tts_start in my config and it works perfectly with my Korvo-1, see below. Don’t forget to enable to configure the device to allow it to make action (service) calls as stated a few posts up…


  on_tts_stream_start:
    - lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
  on_tts_end:
    - homeassistant.service:
        service: media_player.play_media
        data:
          entity_id: media_player..output_speaker
          media_content_id: !lambda 'return x;'
          media_content_type: music
          announce: "true"
  on_tts_stream_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
  on_end:

Hi all! First off, thanks to everyone in this thread, as this was what enabled me to get output to my GHome speakers when setting up my new Atom Echos.

That said, some things have evolved in recent times, specifically with regard to being able to easily disable the onboard speakers of the Atom Echo and S3 Box so the Assist responses only come out of your chosen speaker: the !remove statement. So, in the end, this is my current fully working version of the config edit for the voice_assistant: block, which also includes some tweaks for the Atom Echo’s microphone to allow Assist an easier time understanding you in more situations beyond total silence. With the below noise and vol tweaks, my Atoms can still hear me clearly from an entire room away most of the time:

voice_assistant:
  noise_suppression_level: 4     # increase noise suppression to 3 -or- 4 from default 2 for better sound floor suppression
  volume_multiplier: 5.0     # increase multiplier from 2.0 to 5.0 to give the mic a little boost...going above 5.0 with Atom Echo resulted in distorted audio for me
  speaker: !remove     # remove the default 'echo_speaker' entry so VA doesn't use internal speaker at all. NOTE: THIS ALSO DISABLES SOUND FOR TIMERS but LED will still flash on finish
  on_tts_start:     # this gets the TTS pipeline started earlier than 'on_tts_end' and reduces response delay for the user, but might not work for Amazon Echos
    - homeassistant.service:
        service: tts.cloud_say
        data:
          entity_id: media_player.my_speaker_2
          message: !lambda 'return x;'

Instead of rewriting the entire on_timer_finished: block to push the timer audio to the media_player, I’ll probably just expose the timer_ringing switch to HA and automate off of that, as I would like to be able to cancel the timers from an HA notification on my phone, as well, instead of just the button on the Atom. This isn’t a priority for me right now, as I don’t use timers very often. I’m just glad to be able to hear my Atom’s responses now without the faint crackly echo of the, well, Echo. haha

2 Likes

Hey guys!! I’m new to the forum, I found a solution trying to solve the problem of the sound only coming out on my Google speaker, I commented a line on the speaker, and it worked without error.

image

I tried nearly all variations here, but all I get as output on my amazon echo device is that it tells me something about an https url. But not the real response.
Any Idea?
Config looks like that:

  on_tts_start:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: none
    - homeassistant.service:
        service: tts.cloud_say
        data:
          entity_id: media_player.echo_wohnzimmer_2    #replace this with your media player entity id
          message: !lambda 'return x;'

I finally got around to exposing timer_ringing (indirectly via new switch) so I could automate off of it. Now, via automations, I get a TTS announcement and a notification on my phone when a timer is finished.

Here is my current complete set of customizations for anyone wondering:

voice_assistant:
  # Adjust Mic parameters for better understanding depending on room environment
  noise_suppression_level: 2   # 1-4
  volume_multiplier: 5.0   #1.0-6.0 (higher than 6 will result in major distortion)
  # Don't use Atom's speaker at all
  speaker: !remove     
  # Output response as a TTS to a chosen speaker
  on_tts_start:     
    - homeassistant.service:
        service: tts.cloud_say
        data:
          entity_id: media_player.den_speaker_2
          message: !lambda 'return x;'
  # I want to know when an Atom loses connection to HA, so blink the light fast red
  on_client_disconnected:     
    then:
    - voice_assistant.stop: {}
    - micro_wake_word.stop: {}
    - light.turn_on:
        id: led
        red: 1.0
        green: 0.0
        blue: 0.0
        brightness: 1.0
        effect: Fast Pulse
        state: true

# Expose a restart button to HA so the Atom can be remotely rebooted 
# (can fix a stuck pipeline or unstable wifi connection in multi-AP/mesh environments)  
button:     
- platform: restart
  id: restart_btn
  name: Reboot
  disabled_by_default: false
  icon: mdi:restart-alert
  entity_category: config
  device_class: restart  

# Expose a new switch to HA to indicate timer_ringing AND ability to 
# toggle it back to an off state (acknowledges the timer, same as pressing 
# Atom's front button); automate using this switch
switch:     
- platform: template
  name: Timer Ringing
  optimistic: true
  lambda: |-
    if (id(timer_ringing).state) {
      return true;
    } else {
      return false;
    }
  turn_off_action:
      - switch.turn_off: timer_ringing

I believe Alexa devices work differently because the audio has to be sent to Amazon before being sent back down to the Echo (don’t get me started on Echo devices and Amazon’s data hoovering; suffice it to say don’t send anything you don’t want Amazon to aggressively use against you).

Config for atom echo with tts sent to separate media player.

Uses tts.cloud_say so will require home assistant cloud

Automatically sets the volume to 20% for voice responses and returns to previous volume state after response.

voice_assistant:
  id: va
  microphone: echo_microphone
  speaker: echo_speaker
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  vad_threshold: 3
  on_listening:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: pulse
  on_tts_start:
    - light.turn_on:
        id: led
        blue: 0%
        red: 0%
        green: 100%
        brightness: 100%
        effect: pulse
    - homeassistant.service:
        service: media_player.volume_set
        data:
          entity_id: media_player.living_room_speaker  # Replace with your media player entity_id
          volume_level: 0.2  # Set volume to 20%
    - homeassistant.service:
        service: tts.cloud_say
        data:
          entity_id: media_player.living_room_speaker
        data_template:
          message: "{{ tts_message }}"
        variables:
          tts_message: return x;
  on_end:
    - light.turn_off: led
    - homeassistant.service:
        service: media_player.volume_set
        data_template:
          entity_id: media_player.living_room_speaker
          volume_level: "{{ states('media_player.living_room_speaker.attributes.volume_level') }}"
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led
        blue: 0%
        red: 100%
        green: 0%

Here’s a config that uses piper instead of tts.cloud_say for local tts
I run piper and faster-whisper via docker on a separate sever. You can simply use the piper addon if your hardware is decent or run as a container for very fast responses. As of 2024.12 my response times are a second or less.

voice_assistant:
  id: va
  microphone: echo_microphone
  speaker: echo_speaker
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  vad_threshold: 3
  on_listening:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: pulse
  on_tts_start:
    - light.turn_on:
        id: led
        blue: 0%
        red: 0%
        green: 100%
        brightness: 100%
        effect: pulse
    - homeassistant.service:
        service: media_player.volume_set
        data:
          entity_id: media_player.living_room_speaker  # Replace with your media player entity_id
          volume_level: 0.2  # Lower volume to 20%
    - homeassistant.service:
        service: tts.piper_say
        data:
          entity_id: media_player.living_room_speaker  # Replace with your media player entity_id
          message: "{{ tts_message }}"
        variables:
          tts_message: return x;
  on_end:
    - light.turn_off: led
    - homeassistant.service:
        service: media_player.volume_set
        data_template:
          entity_id: media_player.living_room_speaker
          volume_level: "{{ states('media_player.living_room_speaker.attributes.volume_level') }}"
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led
        blue: 0%
        red: 100%
        green: 0%