I have painstakingly made an automation to play the weather and then the news, dynamically on the calling Satellite. It works for one fatal flaw.
All actions fire at once (with or without the state wait).
Seems awefully limiting for these to be async actions. They all fire and play at once. Net result is theres no weather. The news plays right over the top, even though it’s last to fire. Is there a better way?
I don’t quite understand your link for the timing/async issue. There appears to be no working solution there for my issue, and the suggestion is to do what I’m already doing - waiting for the entity to become idle, which doesn’t work.
In my original example I’m using a wait template - they use a wait for trigger. I’m trying that now and it skipped the weather and played the “Here’s the latest news” announcement, then went silent. Trace shows news step was never reached. Probably because device didn’t become idle.
Try using the media player status in the template.
My test script executed a sequence of three announcements (synthesized Piper on the fly) without any problems.
Nope - it doesn’t reliably go from playing to idle after TTS. Back to the drawing boards. i actually think that’s a Satellite issue and might explain why it’s getting “stuck” a lot in other automations.
I’m also getting TTS being skipped and going straight to the weather.
In the template, I use the satellite’s own media player.
In the script, I explicitly specify the device_id {{ is_state('media_player.esp32va01', 'idle') }}, while in the automation, I use the expression {{ is_state('media_player.'~device_attr(trigger.device_id, 'name') | lower, 'idle') }}. The device name must match the ID for the expression to work.
If you are using audio response transmission to a third-party device, I cannot assist with that, as I do not use such a configuration.
Thanks I’ll play some more and some good template formats to experiment with. No luck yet.
It’s more complex than it needs to be in my opinion. Different template formats for very similar use-cases that don’t seem to call for it. Different Satellites behaving differently (Wyoming Vs ESP Vs Linux_ESP) and perhaps the most challenging thing for me is a litany of fail-modes…
I have ghost TTS and Actions playing and entire Automations (seemingly). While I have deleted TTS cache, and even have a TTS Cache-delete action at the start of the reference automation… I get old prompts playing, I get new prompts that don’t play.
My VA tries to answer instead of the automation. Presumably a Whisper STT miss-rec. In this case you have to cancel the current conversation, as repeating yourself in this state won’t trigger an automation intent.
I have actions firing in seemingly non-sequential order.
ESP Assistant sits in responding mode for minutes on end, with subsequent Auto invocations only partially working, the failing in on certain actions. VA answers general questions in this hung state (with TTS playing on the Satellite), but automations with TTS won’t run until it clears.
Satellite media player state remains “playing” half the time too, long after any TTS has completed.
Satellite audio device seems to get locked up on occasion, a consequence of useless Linux Audio.
I’m stripping everything back to the most simple single Satellite setup and going to try to solve this TTS / wait for action to complete sequence today and them I’m going to build it up from there. But there’s definitely some bugs in here.
{{states("assist_satellite.pi3_assist_satellite")}} seems to cycle through…
idle
listening
processing
responding
With two exceptions being:
Where VA expects a followup input prompt where it goes back to listening. E.g. no intent hit for automation, so it tries to clarify (expected).
Randomly sits in processing after TTS action in automation is played back (inexplicable). This; I think, explains a lot of the inconsistent behavior waiting for a TTS prompt to “complete”, to move on to the next action.
Edit: Further to this I have observed Ollama based VA sometimes tries to handle the query even when it matches the intent for an automation word-for-word. Is there a technique to achieve some level of priority/consistency?
I landed on a solution. Calculate the approximate runtime of the TTS. Works for me if I play the audio file AFTER any TTS.
(Gemini3)
If you try to run TTS followed immediately by an audio stream on an Assist Satellite (ESP32), the stream often starts before the TTS finishes because the satellite action is “fire-and-forget.”
Since satellites don’t always reliably report a playing state to HA, the most robust fix is calculating a dynamic delay based on the character count of the message.
Here is a script template that calculates the delay with a “safety multiplier” to ensure the speech finishes before the music starts.
alias: Morning Briefing (Dynamic Delay)
sequence:
# 1. Define your message and tuning variables
- variables:
weather_message: >-
Good morning! The forecast is {{ states('sensor.weather_home') }}.
Here is the latest news.
# --- TUNING SECTION ---
# Average chars per second (13 is standard for English)
chars_per_sec: 13
# Multiplier: 1.0 is exact. 1.2 adds 20% padding.
safety_multiplier: 1.2
# Buffer for network latency (seconds)
latency_buffer: 1
# ----------------------
# Calculate the delay
tts_delay: >-
{{ (((weather_message | length) / chars_per_sec) * safety_multiplier)
| round(0, 'ceil') + latency_buffer }}
# 2. Send the TTS
- target:
entity_id: assist_satellite.kitchen_satellite
data:
message: "{{ weather_message }}"
action: assist_satellite.announce
# 3. Wait for the calculated duration
- delay:
seconds: "{{ tts_delay }}"
# 4. Play the stream (now that TTS is done)
- target:
entity_id: assist_satellite.kitchen_satellite
data:
media_id:
media_content_id: "http://my-stream-url.mp3"
media_content_type: audio/mpeg
action: assist_satellite.announce
Why standard approaches didn’t work:
Wait for State (wait_template):
Issue:assist_satellite actions are often “fire-and-forget.” The script advances before the device reports playing, and many satellites don’t reliably report a playing state for short announcements, causing the script to hang or skip immediately.
Script Queuing (mode: queued):
Issue: This queues the execution of the script itself, not the actions inside it. It stops the script from running twice simultaneously but doesn’t force Action B to wait for Action A to finish audio playback.
Multi-Part Responses (Custom Intents):
Issue: These are designed for returning complex responses to a specific voice command (e.g., “What’s the weather?”). They are difficult to “inject” into a proactive automation (push notification) where no voice command initiated the flow.