Using SSML you can add additional markup to your text to speech message. For example you can emphasize certain parts, or add sound effects.
The Google Cloud TTS action supports SSML, and using that I made the following action to generate a message when someone is at the door when someone in the houseshold is celebrating their birthday (normally the variables in the first line are generated somewhere else in the automation, and normally it’s in Dutch, I translated it for this example):
action: tts.google_cloud_say
data:
entity_id: media_player.martijn
language: en-US
options:
gender: male
voice: en-US-Wavenet-F
message: >
{% set ha_url, name = 'https://foo.bar.bla', 'Birthday Boy' %}
<speak>
<par>
<media repeatCount="2" soundLevel="+2.28dB"
fadeInDur="2s" fadeOutDur="0.2s" begin="0s">
<audio
src="{{ ha_url }}/local/misc/roltong2.mp3"/>
</media>
<media xml:id="hooray" begin="0.5s">
<speak><emphasis level="strong">Hooray!</emphasis></speak>
</media>
<media begin="hooray.begin+0.5s" soundLevel="-2dB"
fadeInDur="2s" fadeOutDur="0.2s">
<audio
src="{{ ha_url }}/local/misc/roltong1.mp3"/>
</media>
<media xml:id="door" begin="hooray.end+1.5s">
<speak>There's someone at the door! Maybe it's a guest for {{ name }}!</speak>
</media>
<media begin="door.end-0.2s" soundLevel="-6dB">
<audio
src="{{ ha_url }}/local/misc/roltong2.mp3"/>
</media>
</par>
</speak>
This generates the following message:
It would be great if SSML would also be supported in the Nabu Casa cloud TTS action. There is documentation by Microsoft here, so I would expect it to be supported by Azure TTS which is used by Nabu Casa.