I have some variable TTS commands that execute throughout my day. Right now, I have delays set on each TTS command but because the message is always variable (due to using chatgpt to generate the messages), it tends to cut off just before the end of the sentence or take a long time before it executes the next step in the automation.
Is there a way to check when a TTS message has finished. Other solutions have said to implement a wait trigger for the media player to go idle. However, my sonos speaker never goes idle when the tts is spoken. Either this has something to do with how sonos handles tts or how my tts software handles it. I am currently using tts.elevenlabs (GitHub - carleeno/elevenlabs_tts: Custom TTS Integration using ElevenLabs API).
I typically run ambient music throughout my day so this could also be the reason why it doesnât go idle when a TTS has finished speaking. Ultimately, in an automation I would like some TTS spoken and then straight after its finished speaking, it executes the next command like change the lights or something.
If there is a possible solution this would make my TTS much more seamless and natural instead of having inconsistent delays between automation steps.
Thank you!
Have you tried using the announce option with the media_player.play_media service?
If you set it to âtrueâ whatever the speaker is playing will resume after the announcement, no need for a delay. Havenât tried it with TTS, but it works on my Sonos with pre-recorded announcements.
The other option is to wait for the media player to start playing, and then for it to become idle, using a wait_template. Thatâs how I handle it when I need that kind of thing:
However, about two versions ago, I encountered problems because, after playing the preamble or announcement, the playerâs state changed to paused as opposed to idle. It was unclear to me when, after playing something, it returned to idle or paused so I changed the wait_template to this:
- wait_template: "{{ states(new_group_master) in ['idle', 'paused'] }}"
NOTE
I also tried use the enqueue option which lets you add tracks (the preamble and announcement) to the queue then play the queue. That worked well until I learned that the queue is not included in the snapshot (itâs documented but I discovered it empirically). The result is a mess.
Here is another way to tackle the delay issue. Granted, itâs not the simplest, BUT IT WORKS 100% of the time for me and it can be adjusted to meet your exact needs. The TTS service, voice and other variables can affect message length and this code can be modified to fit those needs.
Pass in the variable {{message}} you want spoken and adjust as necessary. All my math is based on Google Cloud Say with the default voice. Google Translate Say differs slightly and will require some experimenting with the math. I canât speak for other TTS services. YMMV!
Iâve tried using wait_template but when my tts is executed on my Sonos speaker, the state of the speaker does not change. It lowers the music playing and says the tts but it doesnât actually change the state from âplayingâ to âidleâ so having the wait_template doesnât seem to work for my case. Unless Iâm missing something?
I tried to copy your template you used but this also doesnât work for my case. The state of the sonos speaker never changes to paused or idle while something is playing.
I tested it by playing some music then executing this script:
alias: tts test 2
sequence:
- service: sonos.snapshot
data:
with_group: true
entity_id: media_player.adam_s_room_one
- service: media_player.volume_set
data:
volume_level: 0.49
target:
entity_id: media_player.adam_s_room_one
- service: tts.speak
data:
cache: true
message: hello this is a test message
media_player_entity_id: media_player.adam_s_room_one
target:
entity_id: tts.elevenlabs_tts
- wait_template: >
{{ is_state('media_player.adam_s_room_one', 'paused') or
is_state('media_player.adam_s_room_one', 'idle') }}
continue_on_timeout: true
- service: sonos.restore
data:
with_group: true
entity_id: media_player.adam_s_room_one
mode: single
But the wait trigger is stuck waiting for the media player to pause or go idle after the tts. The reason I want to figure out when the tts has finished is so i can lower the volume back down to an ambient volume. The tts when it is played, is too quiet. So what I want is to raise the volume, say the tts and then go back to ambient volume but I just canât figure out how I can tell when the tts has finished speaking.
I think I could probably use this but it wouldnât work 100% of the time. Because I am using elevenlabs tts, there is a small buffer depending on how long the tts is the generate audio file. Also, I think this would work better with google_translate or a similar alternative compared to AI generated tts because itâs more consistent compared to elevenlabs having stability and clarity + similarity enhancements settings. If two of the same sentences are generated with elevenlabs, they arenât necessarily spoken in the same way.
I think this might solve my problem but it wouldnât be as consistent for my use case. I will definitely try this solution if there is absolutely no way I can figure out when the tts finishes speaking.
Iâm unsure how to play the local tts files through media_player.play_media. I can see the tts files are stored in the root/config/tts path but how do I play them? Also I think it would be difficult trying to figure out what the tts filename is when trying to play it this way.
The media_playerâs state should change to idle or pausedafter the media_player has finished playing the content. While itâs playing the content, its state is typically playing.
The exception to this rule is if the content is played using the media_playerâs announce feature (as mentioned by Tinkerer). Currently, it doesnât report a change from playing to idle, or paused, thereby making it difficult to know when the content is/isnât actually playing.
I learned from community member TheFes that when you use tts.google_translate_say it actually makes a call to media_player.media_play and automatically sets the announce option set to true.
Iâll hazard a guess that this also happens to tts.speak.
Perhaps you can try something like this (untested). The tricky part might be to get the right TTS key word. For Google Translate itâs google_cloud and I donât know what it is for Elevenlabs (itâs a custom integration) so I simply used elevenlabs in the example below.
I even tried it with google_cloud but it also shows provider google_cloud not found so Iâm not sure if Iâm calling the service correctly.
I had a look through the elevenlabs source code and it states that the domain is elevenlabs_tts but not sure if is this is the correct TTS key word. I tried your example and this one but that same error occurs so Iâm unsure if this is because itâs the wrong key word or because there is something wrong with the play_media example.
Youâre the second person in the past week who has marked their own post as the Solution despite using a solution that was provided to them. Thatâs not the communityâs custom for the use of the Solution tag. It makes it appear that everyone ultimately solves their own problem (regardless if the answer came from someone else).
Would this work for a Google Home device?
I use one for my voice control speaker and I need to âwaitâ until it stops for the âlisteningâ to start.
For anyone else stuck on this. Here is the format I use for tts.speak to a Sonos media player in order for it to change state to âplayingâ. You can also use this to specify the language and voice for tts.speak in variables defined up front as I have. Note that tts has a limit to the length of message it can work with (not sure exactly what that is). To get around this limitation, I split my message up into multiple pieces up front, pass them individually into the below action, and wait for the media_player to leave state âplayingâ before starting the next piece.