I have some variable TTS commands that execute throughout my day. Right now, I have delays set on each TTS command but because the message is always variable (due to using chatgpt to generate the messages), it tends to cut off just before the end of the sentence or take a long time before it executes the next step in the automation.
Is there a way to check when a TTS message has finished. Other solutions have said to implement a wait trigger for the media player to go idle. However, my sonos speaker never goes idle when the tts is spoken. Either this has something to do with how sonos handles tts or how my tts software handles it. I am currently using tts.elevenlabs (GitHub - carleeno/elevenlabs_tts: Custom TTS Integration using ElevenLabs API).
I typically run ambient music throughout my day so this could also be the reason why it doesn’t go idle when a TTS has finished speaking. Ultimately, in an automation I would like some TTS spoken and then straight after its finished speaking, it executes the next command like change the lights or something.
If there is a possible solution this would make my TTS much more seamless and natural instead of having inconsistent delays between automation steps.
Thank you!
Have you tried using the announce option with the media_player.play_media service?
If you set it to “true” whatever the speaker is playing will resume after the announcement, no need for a delay. Haven’t tried it with TTS, but it works on my Sonos with pre-recorded announcements.
The other option is to wait for the media player to start playing, and then for it to become idle, using a wait_template. That’s how I handle it when I need that kind of thing:
However, about two versions ago, I encountered problems because, after playing the preamble or announcement, the player’s state changed to paused as opposed to idle. It was unclear to me when, after playing something, it returned to idle or paused so I changed the wait_template to this:
- wait_template: "{{ states(new_group_master) in ['idle', 'paused'] }}"
NOTE
I also tried use the enqueue option which lets you add tracks (the preamble and announcement) to the queue then play the queue. That worked well until I learned that the queue is not included in the snapshot (it’s documented but I discovered it empirically). The result is a mess.
Here is another way to tackle the delay issue. Granted, it’s not the simplest, BUT IT WORKS 100% of the time for me and it can be adjusted to meet your exact needs. The TTS service, voice and other variables can affect message length and this code can be modified to fit those needs.
Pass in the variable {{message}} you want spoken and adjust as necessary. All my math is based on Google Cloud Say with the default voice. Google Translate Say differs slightly and will require some experimenting with the math. I can’t speak for other TTS services. YMMV!
I’ve tried using wait_template but when my tts is executed on my Sonos speaker, the state of the speaker does not change. It lowers the music playing and says the tts but it doesn’t actually change the state from “playing” to “idle” so having the wait_template doesn’t seem to work for my case. Unless I’m missing something?
I tried to copy your template you used but this also doesn’t work for my case. The state of the sonos speaker never changes to paused or idle while something is playing.
I tested it by playing some music then executing this script:
alias: tts test 2
sequence:
- service: sonos.snapshot
data:
with_group: true
entity_id: media_player.adam_s_room_one
- service: media_player.volume_set
data:
volume_level: 0.49
target:
entity_id: media_player.adam_s_room_one
- service: tts.speak
data:
cache: true
message: hello this is a test message
media_player_entity_id: media_player.adam_s_room_one
target:
entity_id: tts.elevenlabs_tts
- wait_template: >
{{ is_state('media_player.adam_s_room_one', 'paused') or
is_state('media_player.adam_s_room_one', 'idle') }}
continue_on_timeout: true
- service: sonos.restore
data:
with_group: true
entity_id: media_player.adam_s_room_one
mode: single
But the wait trigger is stuck waiting for the media player to pause or go idle after the tts. The reason I want to figure out when the tts has finished is so i can lower the volume back down to an ambient volume. The tts when it is played, is too quiet. So what I want is to raise the volume, say the tts and then go back to ambient volume but I just can’t figure out how I can tell when the tts has finished speaking.
I think I could probably use this but it wouldn’t work 100% of the time. Because I am using elevenlabs tts, there is a small buffer depending on how long the tts is the generate audio file. Also, I think this would work better with google_translate or a similar alternative compared to AI generated tts because it’s more consistent compared to elevenlabs having stability and clarity + similarity enhancements settings. If two of the same sentences are generated with elevenlabs, they aren’t necessarily spoken in the same way.
I think this might solve my problem but it wouldn’t be as consistent for my use case. I will definitely try this solution if there is absolutely no way I can figure out when the tts finishes speaking.
I’m unsure how to play the local tts files through media_player.play_media. I can see the tts files are stored in the root/config/tts path but how do I play them? Also I think it would be difficult trying to figure out what the tts filename is when trying to play it this way.
The media_player’s state should change to idle or pausedafter the media_player has finished playing the content. While it’s playing the content, its state is typically playing.
The exception to this rule is if the content is played using the media_player’s announce feature (as mentioned by Tinkerer). Currently, it doesn’t report a change from playing to idle, or paused, thereby making it difficult to know when the content is/isn’t actually playing.
I learned from community member TheFes that when you use tts.google_translate_say it actually makes a call to media_player.media_play and automatically sets the announce option set to true.
I’ll hazard a guess that this also happens to tts.speak.
Perhaps you can try something like this (untested). The tricky part might be to get the right TTS key word. For Google Translate it’s google_cloud and I don’t know what it is for Elevenlabs (it’s a custom integration) so I simply used elevenlabs in the example below.
I even tried it with google_cloud but it also shows provider google_cloud not found so I’m not sure if I’m calling the service correctly.
I had a look through the elevenlabs source code and it states that the domain is elevenlabs_tts but not sure if is this is the correct TTS key word. I tried your example and this one but that same error occurs so I’m unsure if this is because it’s the wrong key word or because there is something wrong with the play_media example.
You’re the second person in the past week who has marked their own post as the Solution despite using a solution that was provided to them. That’s not the community’s custom for the use of the Solution tag. It makes it appear that everyone ultimately solves their own problem (regardless if the answer came from someone else).
For anyone else stuck on this. Here is the format I use for tts.speak to a Sonos media player in order for it to change state to ‘playing’. You can also use this to specify the language and voice for tts.speak in variables defined up front as I have. Note that tts has a limit to the length of message it can work with (not sure exactly what that is). To get around this limitation, I split my message up into multiple pieces up front, pass them individually into the below action, and wait for the media_player to leave state ‘playing’ before starting the next piece.