Hi my dude. I have used HA for some years but totally new to ESPHome, im trying to get this to work to send the TTS to my media player but i get error when i try to add the text in the HA PE device and trying to click install. Should i just add the text after the wifi section, press save and install?
I have given up trying to fight this. The integrated pipeline is designed to do internal processing, or allow your conversation agent do the processing and returns results to the initiation point. There have been a couple of upgrades in December which generally broke my previous work and I didnāt pick it back up.
I am choosing to wait for a more official solution. Seeing where things have been heading since August and specifically in December, I get the feeling it wonāt be long.
Hey @superwizbang, if you donāt mind me asking, which upgrades broke your setup? Was it HA itself, ESPHome, Voice PE firmware?
For me it still works like a charm. Did some changes though. For example I still mute my target Sonos player on_start since this will make it much easier to actually understand the voice command. I unmute the target player again on_stt_vad_end. And on_tts_end I let the Music Assistant anouncement system take over.
I also muted the V:PE microphone on_tts_start since I faced the problem, that the V:PE would sometimes start listening again on its own reply from the target speaker since the V:PE finished processing much faster than the target player due to the delay of the non streamed reply. Didnāt find another way yet. If you have ideas let me know since this brings drawbacks like the āstopā word wonāt work until on_end.
The modification to the V:PE looks like this:
# Extended voice assistant actions to redirect TTS output Music Assistant players
# This configuration mutes the V:PE's internal speaker and routes all TTS responses through
# the paired speakers in each room.
# For better listening quality the MA players get muted while V:PE is listening
voice_assistant:
# Voice assistant has started listening
on_start:
# Mute target player for better listening
- homeassistant.action:
action: script.voice_assistant_target_mute
data:
device_name: ${friendly_name}
- media_player.volume_set:
id: external_media_player
volume: 0.0
on_stt_vad_end:
- lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
- script.execute: control_leds
# Unmute taget player again
- homeassistant.action:
action: script.voice_assistant_target_unmute
data:
device_name: ${friendly_name} # Identifies which PE is calling
on_tts_start:
# Mute microphone so V:PE will not pick up parts of the target TTS as new command
- microphone.mute
# Handle LED state changes for visual feedback
- if:
condition:
# The intent_progress trigger didn't start the TTS Reponse
lambda: 'return id(voice_assistant_phase) != ${voice_assist_replying_phase_id};'
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
- script.execute: control_leds
# Start a script that would potentially enable the stop word if the response is longer than a second
- script.execute: activate_stop_word_once
on_tts_end:
# Pass the TTS URL and device identity to Home Assistant script for routing
# The script will determine which speaker to use based on the V:PE's name
- homeassistant.action:
action: script.voice_assistant_on_tts_end
data:
url: !lambda 'return x;'
device_name: ${friendly_name}
- delay:
milliseconds: 1000
# Stop any active announcements if needed
- if:
condition:
media_player.is_announcing:
id: external_media_player
then:
- media_player.stop:
id: external_media_player
announcement: true
# When the voice assistant ends ...
on_end:
- wait_until:
not:
voice_assistant.is_running:
# Unmute microphone since from here a new command for the V:PE is possible
- microphone.unmute
- delay:
milliseconds: 250
# Restore PE internal speaker volume for local feedback sounds
- media_player.volume_set:
id: external_media_player
volume: 0.5
# If the end happened because of an error, let the error phase on for a second
- if:
condition:
lambda: return id(voice_assistant_phase) == ${voice_assist_error_phase_id};
then:
- delay: 1s
# Reset the voice assistant phase id and reset the LED animations.
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
- script.execute: control_leds
On the HA side I used 3 helper scripts to mute/unmute the target player and to play the tts:
- Mute: voice_assistant_target_mute.yaml
description: Mutes the target speaker
sequence:
- variables:
player_mapping:
your first assistant: media_player.your_first_assistant
your second assistant: media_player.your_second_assistant
calling_device: "{{ device_name | lower }}"
target_player: >-
{{ player_mapping[calling_device] |
default('media_player.your_default_assistant') }}
- action: media_player.volume_mute
target:
entity_id: "{{ target_player }}"
data:
is_volume_muted: true
alias: voice_assistant_target_mute
mode: parallel
max: 2
fields:
device_name:
description: Name of calling PE device (from friendly_name substitution)
example: Office Assistant
icon: mdi:volume-mute
- Unmute: voice_assistant_target_unmute.yaml
description: Unmutes the target speaker
sequence:
- variables:
player_mapping:
your first assistant: media_player.your_first_assistant
your second assistant: media_player.your_second_assistant
calling_device: "{{ device_name | lower }}"
target_player: >-
{{ player_mapping[calling_device] |
default('media_player.your_default_assistant') }}
- action: media_player.volume_mute
target:
entity_id: "{{ target_player }}"
data:
is_volume_muted: false
alias: voice_assistant_target_unmute
mode: parallel
max: 2
fields:
device_name:
description: Name of calling PE device (from friendly_name substitution)
example: Office Assistant
icon: mdi:volume-high
- TTS: voice_assistant_on_tts_end.yaml
description: >-
Redirects Voice PE device announcements to SONOS speakers in pre-defined areas
where a sonos speaker is assigned to the PE. Note - the 100ms delay ensures
music is started so that MA can have something to snapshot after the
announcement is made.
sequence:
- variables:
player_mapping:
your first assistant: media_player.your_first_assistant
your second assistant: media_player.your_second_assistant
calling_device: "{{ device_name | lower }}"
target_player: >-
{{ player_mapping[calling_device] |
default('media_player.your_default_assistant') }}
- delay:
milliseconds: 100
- action: music_assistant.play_announcement
data:
url: "{{ url }}"
target:
entity_id: "{{ target_player }}"
alias: voice_assistant_on_tts_end
mode: parallel
max: 2
fields:
url:
description: Audio-URL of Assist-Reply
example: https://ā¦/tts.mp3
device_name:
description: Name of calling PE device (from friendly_name substitution)
example: Office Assistant
icon: mdi:play-pause
Yes, my issues stemmed from the very problems you are working with above.
@bungamungus I have been sidetracked for about two months now working on other projects, mainly a lot of construction here at the house, and havenāt gone back and worked on this again like I would have enjoyed. I cannot answer your question.
Somewhere along the line, I picked up an issue with repeating voice responses and the PE activating randomly, but I never confirmed why. Obviously I have a call in the code which gets caught in a loop under specific circumstances. Since the code sits on top of the static link to the most recent GIT firmware, there is no reason the previous code would not still work. I simply think my scripts have a minor issue to resolve.
Hereās to hoping the code owners make an official path to choose the output source soonā¦
How do we get visibility on this? Iām honestly surprised it isnāt part of the solution yet, considering its been out for over a year. Being able to change the output of the replies to a better suited speaker is a must. The 3.5mm jack is pretty stupid, considering these days nearly everyone uses sonos speaker or the like with home assistant.
Please donāt forget this is a āpreview editionā. Iām sure things get worked on in the meanwhile but donāt expect the preview edition being fully functional in all ways to have all these features you wish for.
That being said donāt forget you can always work on your own version anytime to get the features you want. The magic of open source
Correct, this is more of a feature rather than a flaw.
In an oversimplified explanation: The PE is hard coded to stream voice input to the pipeline as soon as a wake word is detected. From there, the system decodes what was said, decides what to do with it (local/cloud processing), turns the response into text format, then returns the response to the originating device.
All of that is integrated. What we are doing here, is intercepting the response text at the only point where we are given to do so and having the system re-route that to the Sonos device of designation and muting the response on the PE so a person does not hear the same response on two devices at the same time.
The key to this whole process is the PE firmware immediately begins streaming voice input as soon as the wakeword is detected. There is no place to interrupt/intercept and offload the streamed input to another pipeline for interpretation without completely rewriting the firmware.
With all of that explained, the reason this is likely not a feature yet, is precisely for the reason @Greenlander explained. This is a Preview Edition. They need to scale down areas of potential problems to focus on core functionality. Opening up the pipeline to user interpretation introduces a world of new issues like the ones here. They would rather get the system WORKING first rather than allow areas of potential problems to exist.
This brings me back to the place of hoping we are far enough along now to potentially see some areas of customization be written into the firmware. I will admit, I donāt have the time to write a fully custom firmware for this device or a voice input ESP device. Could I/we? Sure. I just have life in the way.
Hey guys!
Iām using @Greenlander solution for redirecting announcement to Sonos speaker via Music Assistant. Everything works well for single-turn commands, but Iām hitting an issue with continuing conversation.
When continuing conversation is enabled and the pipeline ends, instead of going to IDLE, the voice assistant transitions to RESPONSE_FINISHED ā START_MICROPHONE and immediately restarts the pipeline. This happens before the Sonos speaker has finished playing the TTS announcement (Sonos announce typically takes several seconds longer than the PE internal pipeline).
The result is that the microphone is already actively streaming while Sonos is still playing the response out loud in the room ā so the microphone picks up the Sonos audio.
For context, Iām using an input_boolean helper as a synchronization bridge between HA and ESPHome to signal when Sonos has finished playing. In the on_end block, I have a wait_until that blocks the microphone unmute until the helper turns off. This works perfectly for normal (non-continuing) conversation flow. However, with continuing conversation, the pipeline restarts via a different code path that bypasses on_endās wait logic entirely.
Have any of you ever tried to solve this problem?
What I need:
Sonos plays the announcement while PE waits, when Sonos stops, PE starts lisening (or unmute) and process answer to the previeous LLM question.
binary_sensor:
- platform: homeassistant
id: sonos_announcing
entity_id: input_boolean.sonos_announcing
# Extended voice assistant actions to redirect TTS output Music Assistant players
# This configuration mutes the V:PE's internal speaker and routes all TTS responses through
# the paired speakers in each room.
# For better listening quality the MA players get muted while V:PE is listening
voice_assistant:
# Voice assistant has started listening
on_start:
# Mute target player for better listening
- homeassistant.action:
action: script.voice_assistant_target_mute
data:
device_name: ${friendly_name}
- media_player.volume_set:
id: external_media_player
volume: 0.0
on_stt_vad_end:
- lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
- script.execute: control_leds
# Unmute taget player again
- homeassistant.action:
action: script.voice_assistant_target_unmute
data:
device_name: ${friendly_name} # Identifies which PE is calling
on_tts_start:
# Mute microphone so V:PE will not pick up parts of the target TTS as new command
- microphone.mute
# Handle LED state changes for visual feedback
- if:
condition:
# The intent_progress trigger didn't start the TTS Reponse
lambda: 'return id(voice_assistant_phase) != ${voice_assist_replying_phase_id};'
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
- script.execute: control_leds
# Start a script that would potentially enable the stop word if the response is longer than a second
- script.execute: activate_stop_word_once
on_tts_end:
# Pass the TTS URL and device identity to Home Assistant script for routing
# The script will determine which speaker to use based on the V:PE's name
- homeassistant.action:
action: script.voice_assistant_on_tts_end
data:
url: !lambda 'return x;'
device_name: ${friendly_name}
- delay:
milliseconds: 1000
# Stop any active announcements if needed
- if:
condition:
media_player.is_announcing:
id: external_media_player
then:
- media_player.stop:
id: external_media_player
announcement: true
# When the voice assistant ends ...
on_end:
- wait_until:
not:
voice_assistant.is_running:
- wait_until:
condition:
binary_sensor.is_off:
id: sonos_announcing
timeout: 30s
- delay:
milliseconds: 500 # ā dĆ”t HA Äas zapnout helper
# Unmute microphone since from here a new command for the V:PE is possible
- microphone.unmute
- delay:
milliseconds: 250
# Restore PE internal speaker volume for local feedback sounds
- media_player.volume_set:
id: external_media_player
volume: 0.5
# If the end happened because of an error, let the error phase on for a second
- if:
condition:
lambda: return id(voice_assistant_phase) == ${voice_assist_error_phase_id};
then:
- delay: 1s
# Reset the voice assistant phase id and reset the LED animations.
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
- script.execute: control_leds