Hi all.
Been trying for some time to get the Voice PE to output its TTS to an external media player.
I use a Windows PC (oldschool HTPC setup) in my living room with a good sound system and wanted the output here instead of the crapy builtin speaker.
Before this i had zero knowledge about ESPHome but i have been running HA instance for 4-5 years.
I took alot of inspiration from this thread Redirect Voice PE Replies to Sonos - Community Guides - Home Assistant Community.
ChatGPT helped me all the way here with the code.
Firstly i installed HASS.Agent on my PC(called sofa) and got it working.
Then i installed ESPHome and imported the Voice PE.
Voice PE β Play Replies on an External Media Player (No Double Audio)
Goal
Build a Voice PE satellite that:
Listens locally (microphone, wake word, LEDs all work normally)
Runs the full Assist pipeline in Home Assistant
Plays TTS replies on an external media player (e.g. media_player.sofa)
Does not speak locally at the same time
Works reliably, without race conditions or ESPHome YAML errors
In this setup, media_player.sofa is a Windows PC running Hass.Agent, exposed to Home Assistant as a media player.
The Robust Solution (Recommended)
Key Idea
Let Voice PE keep generating TTS, but:
- Mute the local Voice PE speaker
- Capture the generated TTS URL
- Hand off playback to Home Assistant
- Let HA play the reply on any media player (here: a Windows PC via Hass.Agent)
This is done using:
- an
input_texthelper as a bridge - a Home Assistant script for playback
- a small ESPHome override
ESPHome (Voice PE override)
What this does
- Ducks the local mixer
- Mutes the local Voice PE speaker
- Saves the TTS URL into Home Assistant
- Triggers the HA playback script
- Restores everything afterward
ESPHome YAML (override only)
substitutions:
name: home-assistant-voice-095a6b
friendly_name: Home Assistant Voice 095a6b
packages:
Nabu Casa.Home Assistant Voice PE: github://esphome/home-assistant-voice-pe/home-assistant-voice.yaml
esphome:
name: ${name}
name_add_mac_suffix: false
friendly_name: ${friendly_name}
api:
encryption:
key: ******************
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
# ------------------------------------------------------------
# Voice PE reply redirect (NO double audio + no "blip" after idle)
#
# Home Assistant prerequisites:
# 1) Helper: input_text.voice_pe_tts_url
# 2) Script: script.voice_pe_play_reply_on_sofa
# (reads input_text.voice_pe_tts_url and plays it on media_player.sofa)
# ------------------------------------------------------------
voice_assistant:
# PRE-MUTE EARLY to prevent the "local starts speaking briefly then stops" issue after idle
on_intent_progress:
- if:
condition:
lambda: 'return !x.empty();'
then:
- logger.log:
level: DEBUG
format: "Redirect: pre-muting local Voice PE speaker before TTS begins."
- media_player.volume_set:
id: external_media_player
volume: 0.0
# Duck hard during TTS and ensure local speaker stays muted
on_tts_start:
- logger.log:
level: INFO
format: "Redirect: ducking local mixer + muting Voice PE speaker."
- mixer_speaker.apply_ducking:
id: media_mixing_input
decibel_reduction: 51
duration: 0s
- media_player.volume_set:
id: external_media_player
volume: 0.0
# Save the TTS proxy URL and trigger HA playback on sofa (Windows PC via Hass.Agent)
on_tts_end:
- logger.log:
level: INFO
format: "Redirect: saving TTS URL to HA helper + starting sofa playback script."
- homeassistant.service:
service: input_text.set_value
data:
entity_id: input_text.voice_pe_tts_url
value: !lambda |-
return x;
- homeassistant.service:
service: script.turn_on
data:
entity_id: script.voice_pe_play_reply_on_sofa
# Restore state when pipeline is fully finished
on_end:
- wait_until:
not:
voice_assistant.is_running:
- mixer_speaker.apply_ducking:
id: media_mixing_input
decibel_reduction: 0
duration: 0s
- media_player.volume_set:
id: external_media_player
volume: 1.0
I added this to script.yaml:
voice_pe_play_reply_on_sofa:
alias: Voice PE β Play reply on sofa
mode: restart
sequence:
- variables:
url: '{{ states(''input_text.voice_pe_tts_url'') }}'
- condition: template
value_template: '{{ url.startswith(''http'') }}'
- target:
entity_id: media_player.sofa
data:
media_content_id: '{{ url }}'
media_content_type: music
action: media_player.play_media
Added this to configuration.yaml:
input_text:
voice_pe_tts_url:
name: Voice PE last TTS URL
max: 255
Why This Works (and Why Others Fail)
No unsupported ESPHome YAML
No direct media_player hijacking
No timing race conditions
Works with any HA media player:
- Sonos
- Music Assistant
- Chromecast
- Windows PC via Hass.Agent
Voice PE remains fully functional as a satellite
Voice PE still believes it is playing locally β but itβs muted β while Home Assistant takes over actual playback.
Result
You end up with a clean, professional Voice Assistant setup:
- One device listens
- Another device speaks
- No echo
- No hacks
- No flakiness
This is effectively how commercial multi-room assistants work β just implemented with full local control.
If you want, this approach can easily be extended to:
- restore exact previous volume
- room-aware replies
- multi-room announcements
- Music Assistant ducking
- LED sync with external playback
But as-is, this is already a production-grade solution.