Voice PE β†’ Play Replies on an External Media Playerer

Hi all.

Been trying for some time to get the Voice PE to output its TTS to an external media player.

I use a Windows PC (oldschool HTPC setup) in my living room with a good sound system and wanted the output here instead of the crapy builtin speaker.

Before this i had zero knowledge about ESPHome but i have been running HA instance for 4-5 years.

I took alot of inspiration from this thread Redirect Voice PE Replies to Sonos - Community Guides - Home Assistant Community.

ChatGPT helped me all the way here with the code.

Firstly i installed HASS.Agent on my PC(called sofa) and got it working.
Then i installed ESPHome and imported the Voice PE.

Voice PE β†’ Play Replies on an External Media Player (No Double Audio)

Goal

Build a Voice PE satellite that:

  • :microphone: Listens locally (microphone, wake word, LEDs all work normally)
  • :brain: Runs the full Assist pipeline in Home Assistant
  • :loud_sound: Plays TTS replies on an external media player (e.g. media_player.sofa)
  • :zipper_mouth_face: Does not speak locally at the same time
  • :brick: Works reliably, without race conditions or ESPHome YAML errors

In this setup, media_player.sofa is a Windows PC running Hass.Agent, exposed to Home Assistant as a media player.

The Robust Solution (Recommended)

:brain: Key Idea

Let Voice PE keep generating TTS, but:

  • Mute the local Voice PE speaker
  • Capture the generated TTS URL
  • Hand off playback to Home Assistant
  • Let HA play the reply on any media player (here: a Windows PC via Hass.Agent)

This is done using:

  • an input_text helper as a bridge
  • a Home Assistant script for playback
  • a small ESPHome override

ESPHome (Voice PE override)

What this does

  • Ducks the local mixer
  • Mutes the local Voice PE speaker
  • Saves the TTS URL into Home Assistant
  • Triggers the HA playback script
  • Restores everything afterward

ESPHome YAML (override only)

substitutions:
  name: home-assistant-voice-095a6b
  friendly_name: Home Assistant Voice 095a6b

packages:
  Nabu Casa.Home Assistant Voice PE: github://esphome/home-assistant-voice-pe/home-assistant-voice.yaml

esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}

api:
  encryption:
    key: ******************

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

# ------------------------------------------------------------
# Voice PE reply redirect (NO double audio + no "blip" after idle)
#
# Home Assistant prerequisites:
# 1) Helper: input_text.voice_pe_tts_url
# 2) Script: script.voice_pe_play_reply_on_sofa
#    (reads input_text.voice_pe_tts_url and plays it on media_player.sofa)
# ------------------------------------------------------------

voice_assistant:
  # PRE-MUTE EARLY to prevent the "local starts speaking briefly then stops" issue after idle
  on_intent_progress:
    - if:
        condition:
          lambda: 'return !x.empty();'
        then:
          - logger.log:
              level: DEBUG
              format: "Redirect: pre-muting local Voice PE speaker before TTS begins."
          - media_player.volume_set:
              id: external_media_player
              volume: 0.0

  # Duck hard during TTS and ensure local speaker stays muted
  on_tts_start:
    - logger.log:
        level: INFO
        format: "Redirect: ducking local mixer + muting Voice PE speaker."
    - mixer_speaker.apply_ducking:
        id: media_mixing_input
        decibel_reduction: 51
        duration: 0s
    - media_player.volume_set:
        id: external_media_player
        volume: 0.0

  # Save the TTS proxy URL and trigger HA playback on sofa (Windows PC via Hass.Agent)
  on_tts_end:
    - logger.log:
        level: INFO
        format: "Redirect: saving TTS URL to HA helper + starting sofa playback script."
    - homeassistant.service:
        service: input_text.set_value
        data:
          entity_id: input_text.voice_pe_tts_url
          value: !lambda |-
            return x;

    - homeassistant.service:
        service: script.turn_on
        data:
          entity_id: script.voice_pe_play_reply_on_sofa

  # Restore state when pipeline is fully finished
  on_end:
    - wait_until:
        not:
          voice_assistant.is_running:
    - mixer_speaker.apply_ducking:
        id: media_mixing_input
        decibel_reduction: 0
        duration: 0s
    - media_player.volume_set:
        id: external_media_player
        volume: 1.0

I added this to script.yaml:

voice_pe_play_reply_on_sofa:
  alias: Voice PE – Play reply on sofa
  mode: restart
  sequence:
  - variables:
      url: '{{ states(''input_text.voice_pe_tts_url'') }}'
  - condition: template
    value_template: '{{ url.startswith(''http'') }}'
  - target:
      entity_id: media_player.sofa
    data:
      media_content_id: '{{ url }}'
      media_content_type: music
    action: media_player.play_media

Added this to configuration.yaml:

input_text:
  voice_pe_tts_url:
    name: Voice PE last TTS URL
    max: 255

Why This Works (and Why Others Fail)

  • :heavy_check_mark: No unsupported ESPHome YAML
  • :heavy_check_mark: No direct media_player hijacking
  • :heavy_check_mark: No timing race conditions
  • :heavy_check_mark: Works with any HA media player:
    • Sonos
    • Music Assistant
    • Chromecast
    • Windows PC via Hass.Agent
  • :heavy_check_mark: Voice PE remains fully functional as a satellite

Voice PE still believes it is playing locally β€” but it’s muted β€” while Home Assistant takes over actual playback.


Result

You end up with a clean, professional Voice Assistant setup:

  • One device listens
  • Another device speaks
  • No echo
  • No hacks
  • No flakiness

This is effectively how commercial multi-room assistants work β€” just implemented with full local control.


If you want, this approach can easily be extended to:

  • restore exact previous volume
  • room-aware replies
  • multi-room announcements
  • Music Assistant ducking
  • LED sync with external playback

But as-is, this is already a production-grade solution.

1 Like