Support SSML with Microsoft TTS

dknils · October 30, 2022, 9:19am

SSML will just give TTS that extra edge.

(and unfortunately, Google Cloud TTS is very unstable in my region, but MS does).

aejay · March 25, 2023, 5:04pm

Just want to chime in to add more details in case it helps convince anyone else.

Microsoft’s TTS voices are very powerful (when compared to Google Cloud TTS, for instance) because many of their voices support several styles (friendly, angry, cheerful, etc) which can completely change the tone of the spoken message.

When crafting a message as SSML, you can choose one of these styles, or choose to say parts of the message differently than other parts.

The Google Cloud TTS integration supports specifying whether the message is plaintext or SSML; it would be great to bring similar functionality into this integration.

I’m tempted to try my hand at contributing this change, but I’m a complete novice when it comes to Python, so I doubt that I’d be able to deliver an acceptable PR for the change.

aejay · March 26, 2023, 4:26pm

Actually, it appears the current integration DOES support some level of SSML, you just have to be a bit tricky with it. It seems, for some reason, like you can’t wrap your whole message in a speak element like usual. But I was able to change the style of my voice with this TTS config:

- platform: microsoft
  api_key: !secret microsoft_tts_key
  gender: Male
  type: DavisNeural

and I made it happen by sending this message:

service: tts.microsoft_say
data:
  entity_id: media_player.office_speaker
  message: >-
    <mstts:express-as xmlns:mstts="http://www.w3.org/2001/mstts" style="friendly">Hey, it's working!</mstts:express-as>

Note how I had to declare the namespace as part of the element… normally I would do that in the surrounding speak element, to not have to repeat myself if I need to use other extension elements. But doing it within the element that requires it seems to do the trick.