ESPHome Voice Assistant speech output to Home Assistant Media Player

I believe Alexa devices work differently because the audio has to be sent to Amazon before being sent back down to the Echo (don’t get me started on Echo devices and Amazon’s data hoovering; suffice it to say don’t send anything you don’t want Amazon to aggressively use against you).

Config for atom echo with tts sent to separate media player.

Uses tts.cloud_say so will require home assistant cloud

Automatically sets the volume to 20% for voice responses and returns to previous volume state after response.

voice_assistant:
  id: va
  microphone: echo_microphone
  speaker: echo_speaker
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  vad_threshold: 3
  on_listening:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: pulse
  on_tts_start:
    - light.turn_on:
        id: led
        blue: 0%
        red: 0%
        green: 100%
        brightness: 100%
        effect: pulse
    - homeassistant.service:
        service: media_player.volume_set
        data:
          entity_id: media_player.living_room_speaker  # Replace with your media player entity_id
          volume_level: 0.2  # Set volume to 20%
    - homeassistant.service:
        service: tts.cloud_say
        data:
          entity_id: media_player.living_room_speaker
        data_template:
          message: "{{ tts_message }}"
        variables:
          tts_message: return x;
  on_end:
    - light.turn_off: led
    - homeassistant.service:
        service: media_player.volume_set
        data_template:
          entity_id: media_player.living_room_speaker
          volume_level: "{{ states('media_player.living_room_speaker.attributes.volume_level') }}"
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led
        blue: 0%
        red: 100%
        green: 0%

Here’s a config that uses piper instead of tts.cloud_say for local tts
I run piper and faster-whisper via docker on a separate sever. You can simply use the piper addon if your hardware is decent or run as a container for very fast responses. As of 2024.12 my response times are a second or less.

voice_assistant:
  id: va
  microphone: echo_microphone
  speaker: echo_speaker
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  vad_threshold: 3
  on_listening:
    - light.turn_on:
        id: led
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: pulse
  on_tts_start:
    - light.turn_on:
        id: led
        blue: 0%
        red: 0%
        green: 100%
        brightness: 100%
        effect: pulse
    - homeassistant.service:
        service: media_player.volume_set
        data:
          entity_id: media_player.living_room_speaker  # Replace with your media player entity_id
          volume_level: 0.2  # Lower volume to 20%
    - homeassistant.service:
        service: tts.piper_say
        data:
          entity_id: media_player.living_room_speaker  # Replace with your media player entity_id
          message: "{{ tts_message }}"
        variables:
          tts_message: return x;
  on_end:
    - light.turn_off: led
    - homeassistant.service:
        service: media_player.volume_set
        data_template:
          entity_id: media_player.living_room_speaker
          volume_level: "{{ states('media_player.living_room_speaker.attributes.volume_level') }}"
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led
        blue: 0%
        red: 100%
        green: 0%