Google supports SSML in Cloud Text-to-Speech requests to allow for more customization in audio responses by providing details on pauses, and audio formatting for acronyms, dates, times, abbreviations, or text that should be censored.
I consider this a must-have to make it sound as natural as possible. The start page on Google can be found here, and the documentation can be found here. It looks rather easy to implement since instead of the text
parameter it should be a ssml
parameter.