Sonos Cloud API - Better alerts & TTS

For anyone following along here, 2023.5 will have this feature natively without needing a custom component or a cloud-based API! :tada:

The media_player.play_media service will now use the announce parameter to trigger this feature, and will also support the per-call volume override. TTS calls will also use the audio clip overlay automatically.

If you’re feeling adventurous, the current HA beta has this available today. Please let me know in a core issue if anything appears incomplete compared to the custom component.

3 Likes

Mixed feelings here. I’ve loved this integration and a massive thank you for creating it! That said, great to see Core now handling Sonos announcement the way we want it.

HA 2023.5 is now released, please test out using the standard media_player entities with either the announce parameter or TTS and let me know how it works!

I’ve switched over all of my automations to the default media_players, so far everything is working as expected.

2 Likes

I’m a little confused :upside_down_face:

(Before I start I’ll just say that I’m still on 2023.4)

How does this actually work in practice with multiple Sonos in a group?
Does this do away with the need for unjoin, join and restore?

For example if Sonos1, Sonos2 & Sonos3 are in a group playing something and along comes HA with a demand to play something else on only Sonos1 & Sonos2 using announce I assume we still need to manage the grouping separately?

But that will mean any Sonos unjoined will stop playing what was playing before?
Maybe the new announce is only useful for single media players?

Also, what is the TTS option?

I have had TTS working on my system for so long that I have not needed to look at the docs for many years.

But the docs have what I am sure is a new service - speak but it is very unclear to me if I should be using this (and if so, how?) or continuing with the legacy say

I have Google Translate and IBM Watson configured (but at the moment I mainly use Nabu Casa Cloud)

tts:
  - platform: google_translate
  - platform: watson_tts

It will overlay the notification/TTS on top of any playing audio on each of the targeted speakers. Say you’re playing music in rooms A, B, and C. You can call tts.service_say on room B and it will speak the announcement only in that room while music continues to play everywhere (including room B, but at reduced volume behind the announcement).

If you want the announcement to play everywhere, just target all those speakers in the same TTS service call.

Basically, you should no longer need to worry about managing snapshots or groups when playing announcements.

1 Like

I think I may have found a serious bug.

When playing using tts.service_say the state of the media player (Sonos in my case) does not change from paused to playing and none of the attributes are updated.

This a serious issue because I rely on one or both of the state and the media_duration to tell me when an announcement is finished.

I have watched the media player state in the Dev Tools while a tts plays using my automations as well as when the service is called directly from the Dev Tools.

image

I have also cross checked using the Sonos app. It only fails to update the state and attributes with tts.

I can confirm this is definitely happening for me and it is only since HA 2023.05 but if I am wrong and you cannot reproduce this, is there anything I can do to troubleshoot?


I can explain why I still need to know when an announcement is finished but it would unnecessarily overcomplicate this post. Needless to say, in some cases need to use snapshot, unjoin, join and restore and cannot make use of the new built-in method of pausing the current media.

This is the exactly the intention of the feature as it separates announcements from “regular” music playback. Announcements will not change the state of the media player playback.

Playing announcements no longer needs to snapshot/join/etc–just play the announcement on the speakers where you want to hear them and any music will continue playing uninterrupted.

Is there a way to adjust the announcement volume? I have tried everything to no avail… This is really important. For the TTS service, I mean.
Thanks!

Just to be clear, I think this (change to tts announcements) is a ‘good thing’.

However it seems like a retrograde step to remove (and effectively show incorrect) information. If the media player is playing surely it’s state should show that, likewise the attributes should reflect the reality?

I could see an argument (which I still wouldn’t agree with) that an announcement over media already playing might not update the state and attributes, but if the player is paused and then starts playing the state is literally wrong.

Wouldn’t it be better to make the announce an option for tts in the same way as it is for play_media? That way you can choose to announce or stick with the old snapshot, unjoin, join, restore.


I also agree with this:

If I’m playing loud music, I don’t want my announcements to be loud, and vice versa if my music is quiet background music, I might want my announcements louder so that I hear them properly.

The state of the media_player should reflect any music currently playing and should not be interrupted to show the overlay sound (which does not stop playback of the music anymore). Having to determine/store/restore the state of the media was honestly a ridiculous workflow, but it was the only way it could be accomplished before. This new method which overlays the audio on top is far simpler and I’d argue what most users would expect to happen if they didn’t know the history.

This is described in the documentation here. You can use the volume parameter to set the volume specifically for the announcement without interfering with currently playing music. For TTS, you’ll need to use a special TTS Media Source URL like so:

service: media_player.play_media
target:
  entity_id: media_player.sonos
data:
  media_content_id: >
    media-source://tts/cloud?message="I am very loud"
  media_content_type: music
  announce: true
  extra:
    volume: 80

This new behavior is only when announce is set. If you’re set on using the old behavior, use TTS like above and leave out the announce parameter.

I think the new function is great if you use TTS via Play_Media it works great. You no longer need complicated processes to sign the announcements. You don’t have to worry about whether or on which players something else is being played.

Try out the function, then a lot becomes clearer, it was the same with me. For me, only one way is missing to recognize when playing an announcement in order to be able to play several rear side without cutting off.

Often you want to play signal and then play a message, which is difficult if you don’t have an event what the end of every announcement reports.

Yes I completely agree!
I was an early adopter of your integration and I liked it a lot.

I do have cases where (at the moment) I will still need to use snapshot etc. and read the state and attributes but I won’t go into that now.

I need to look properly into your reply, I think it may well at the very least provide me with workarounds until I decide to try and rewrite my routines to suit the new behaviours.

Thanks for the replies.

Ok, I’ve thought this through a bit more and I have an issue which I don’t think can be covered.
How can we deal with ‘announcement clash’?

If an announcement is made and before it is finished another one is requested the first will, I believe be cut short when the second one starts.

This was always a problem in the past but was worked around by checking for playing, paused or idle or by using media_duration in a wait.


I used this technique to make long messages sound a lot more natural by imposing a pause between sections.

Specifically, for example I have a morning greeting that has many parts…

Weather
News headlines
Bins and other reminders
Whether Arsenal have a match today (:slight_smile: )
etc. etc.

There is no way that I know of (please correct me if I am wrong!) to create a pause in a TTS announcement, so as my script combines all the sections of the announcement it inserts a token where I want a pause. The script splits the message at the tokens and plays each one in a separate tts call with a delay between them equal to the number of ‘pause tokens’ inserted at that point.

e.g.
“This is the weather.[PAUSE]This is the news.[PAUSE][PAUSE]This is the bin reminder.”

Would give a one second pause between the weather and the news, and a two second pause between the weather and the bin reminder.

The script has a wait between announcements for the previous one to finish.

I know this may sound over engineered but when you hear the announcement every day it makes a difference when it sounds more natural. Without the pauses it is just a constant stream of talking.

I don’t believe this is possible any longer.

PS. Yes I could do away with my overengineered pauses and have separate announcements but that doesn’t solve the problem that there is no way of knowing when it is ‘safe’ to start the next one.

Unless of course I have missed something fundamental?

Currently that’s true, there’s no indication when an announcement has finished playing. Perhaps I can find a way, but no promises.

2 Likes

Thanks, It’d be great if you could.
I know this question came up a lot in several threads a few years ago and the only solutions were as I said to using the player state or the duration attribute.

I suspect quite a few people will want this even if it just to stop ‘announcement clash’.

Having use this now for a few days I really do urge you to consider a way to know when an announcement is playing and has ended.

I think it is actually very important.

One (perfect?) solution I can think of would be to add attributes to the media_player such as

- announcement: [true, false]
- announcement duration: x seconds

I don’t have a very deep knowledge of developing for HA so i admit that I have no idea if that is feasible?


I also have another issue (I think).

When using tts.cloud_say am I right in thinking that if the announcement is sent to several players (Sonos) they will not play in synch?

I know this was a limitation of the original integration and was the main reason I only made limited use of it. (As well as wanting my announcements to stay local and not pass through the Sonos organisation).

I may be doing something wrong but I am getting small but significant and annoying delays between each player which makes it sound like an echo if you are in earshot of more than one player.

Ok… another update…

I have now found out that there is very generous free tier for the Google Cloud TTS service (the same one used by Nabu Casa).

As you can imagine from a Google implementation it offers a vast array of customising of the TTS output.

Including:

  • Inserting pauses
  • Inserting audio files in sequence in the TTS (Perfect for small a small ‘ding-dong’ preceding the announcement.

These extras do require the text to be sent in SSML format but that is not a very onerous task. Normal text is sent as, erm, normal text.

The only downside is that you need to jump through a few hoops to set it up initially and provide billing information to Google in case you go over the free allowance (unlikely in my opinion for HA announcements and it seems that at that point you need to opt-in to billing anyway).

I used this website as a source of reference and it a was all fairly painless.

I haven’t yet done much testing but… fingers crossed!

Anyway, this probably means my only outstanding problem is ‘announcement clash’ and now that I can build pauses into a single annou9ncement it is much less of an issue (for me at least. I still think this will be an annoyance to others).

Anyway, this still means there are outstanding problems,

  • ‘announcement clash’ but now that I can build pauses into a single announcement it is much less of an issue (for me at least. I still think this will be an annoyance to others).
  • and of course not being able to group media players

PS. Yes I do see the irony in using Google Cloud after saying this: :wink:


EDIT: So far not bad, except I can’t get it to play an embedded local sound file. Probably some kind of security restriction somewhere. Shame as that would have been brilliant.

As far as embedding a local sound file, you should try using the custom integration Chime TTS which allows you to add a sound file at the start and/or the end of the TTS message. The integration locally combines them into a single file so there’s no lag during playback

I’m trying to use this with Node Red to send announcements when my doorbells are pressed. I can make it work using Developer Tools, to call the service, but I can’t seem to figure out the syntax in Node Red for the JSON. Here is what I’m using in Dev Tools. Any ideas?

service: media_player.play_media
data:
  entity_id: media_player.deck_sonos_move
  media_content_id: >-
    media-source://tts/cloud?message=The front door bell has been pressed.
  media_content_type: music
  extra:
    volume: 50