WTH doesn't the Nabu Casa TTS cloud action support SSML

TheFes · December 9, 2024, 11:03am

Using SSML you can add additional markup to your text to speech message. For example you can emphasize certain parts, or add sound effects.

The Google Cloud TTS action supports SSML, and using that I made the following action to generate a message when someone is at the door when someone in the houseshold is celebrating their birthday (normally the variables in the first line are generated somewhere else in the automation, and normally it’s in Dutch, I translated it for this example):

action: tts.google_cloud_say
data:
  entity_id: media_player.martijn
  language: en-US
  options:
    gender: male
    voice: en-US-Wavenet-F
  message: >
    {% set ha_url, name = 'https://foo.bar.bla', 'Birthday Boy' %}
    <speak>
      <par>
        <media repeatCount="2" soundLevel="+2.28dB"
          fadeInDur="2s" fadeOutDur="0.2s" begin="0s">
          <audio
            src="{{ ha_url }}/local/misc/roltong2.mp3"/>
        </media>
        <media xml:id="hooray" begin="0.5s">
          <speak><emphasis level="strong">Hooray!</emphasis></speak>
        </media>
        <media begin="hooray.begin+0.5s" soundLevel="-2dB"
          fadeInDur="2s" fadeOutDur="0.2s">
          <audio
            src="{{ ha_url }}/local/misc/roltong1.mp3"/>
        </media>
        <media xml:id="door" begin="hooray.end+1.5s">
          <speak>There's someone at the door! Maybe it's a guest for {{ name }}!</speak>
        </media>
        <media begin="door.end-0.2s" soundLevel="-6dB">
          <audio
            src="{{ ha_url }}/local/misc/roltong2.mp3"/>
        </media>
      </par>
    </speak>

This generates the following message:

It would be great if SSML would also be supported in the Nabu Casa cloud TTS action. There is documentation by Microsoft here, so I would expect it to be supported by Azure TTS which is used by Nabu Casa.

stboch · December 11, 2024, 2:21am

The underly nabucasa cloud tts using SSML just seems like they would need to expose an advanced method to pass the full SSML instead of building it.

github.com

NabuCasa/hass-nabucasa/blob/e914e0d0e83a335f5fea388b00b86dc132a5e1a6/hass_nabucasa/voice.py#L1325-L1334


      
          # SSML
          xml_body = ET.Element("speak", version="1.0")
          xml_body.set("{http://www.w3.org/XML/1998/namespace}lang", language)
          voice_el = ET.SubElement(xml_body, "voice")
          voice_el.set("{http://www.w3.org/XML/1998/namespace}lang", language)
          voice_el.set(
              "name",
              f"Microsoft Server Speech Text to Speech Voice ({language}, {voice})",
          )
          voice_el.text = text[:2048]

jackjourneyman · December 16, 2024, 1:59pm

According to the docs Amazon Polly should support SSML, but I haven’t been able to make it happen - I either get stony silence, or it reads out the tags.