Sonos Cloud API - Better alerts & TTS

Thanks, I’ve tried every combination of media-source that i could think of and none of them work. Here’s what I’ve tried and the results:

media_content_id: media-source://tts/google?lanaguage=en-GB&voice=en-GB-Standard-F&message=Testing Sonos text to speech results in: websocket_api script: Error executing script. Error for call_service at pos 1: Provider google not found

I specified the service for google cloud as google_cloud_say, trying media_content_id: media-source://tts/google_cloud_say?lanaguage=en-GB&voice=en-GB-Standard-F&message=Testing Sonos text to speech results in the same error: websocket_api script: Error executing script. Error for call_service at pos 1: Provider google_cloud_say not found

the only one that does not throw an error is the original media source in my first message, but that yields no output. I’m sure it is something simple, but i am just missing it. for now i can use the workaround that @Eoin suggested, but i will keep trying to figure the correct media source.

Instead of trying to create the correct string by hand, test using the Media panel in the frontend and create the service call using the automation UI.

1 Like

I tried building it in automation, but it still asks for Content ID and Content Type. when i call it form the media panel, in the url i see these calls, both that yield the same "Provider not Found message:

media-source://tts/provider
and
media-source://tts/google_cloud

well, it seems that it does not like me setting the voice as part of the Content ID. if i do a call like this:

service: media_player.play_media
data:
  entity_id: media_player.living_room_sonos_cloud
  media_content_id: media-source://tts/google_cloud?message=Hi Ida
  media_content_type: music
  extra:
    volume: 50

it works fine. thanks again for the help!

1 Like

Thank you very much for this cool integration. Sounds so much better than stopping the music. :+1:

1 Like

Great integration. I’ve been using it to provide TTS alerts for various tasks and it works very well. I noticed it doesn’t sync up particularly well across multiple sonos devices when just using an individual service to call on each speaker to do the same thing. Will have to play around further with play on bonded setting instead. Maybe there’s a way for home assistant to group up speakers before calling the service?

Just wondering if there’s a way to play a spotify track via this? If I’m logged into the spotify account inside the sonos ecosystem it’s not as simple as taking the share link for the track and pasting it in unfortunately.

For anyone following along here, 2023.5 will have this feature natively without needing a custom component or a cloud-based API! :tada:

The media_player.play_media service will now use the announce parameter to trigger this feature, and will also support the per-call volume override. TTS calls will also use the audio clip overlay automatically.

If you’re feeling adventurous, the current HA beta has this available today. Please let me know in a core issue if anything appears incomplete compared to the custom component.

3 Likes

Mixed feelings here. I’ve loved this integration and a massive thank you for creating it! That said, great to see Core now handling Sonos announcement the way we want it.

HA 2023.5 is now released, please test out using the standard media_player entities with either the announce parameter or TTS and let me know how it works!

I’ve switched over all of my automations to the default media_players, so far everything is working as expected.

2 Likes

I’m a little confused :upside_down_face:

(Before I start I’ll just say that I’m still on 2023.4)

How does this actually work in practice with multiple Sonos in a group?
Does this do away with the need for unjoin, join and restore?

For example if Sonos1, Sonos2 & Sonos3 are in a group playing something and along comes HA with a demand to play something else on only Sonos1 & Sonos2 using announce I assume we still need to manage the grouping separately?

But that will mean any Sonos unjoined will stop playing what was playing before?
Maybe the new announce is only useful for single media players?

Also, what is the TTS option?

I have had TTS working on my system for so long that I have not needed to look at the docs for many years.

But the docs have what I am sure is a new service - speak but it is very unclear to me if I should be using this (and if so, how?) or continuing with the legacy say

I have Google Translate and IBM Watson configured (but at the moment I mainly use Nabu Casa Cloud)

tts:
  - platform: google_translate
  - platform: watson_tts

It will overlay the notification/TTS on top of any playing audio on each of the targeted speakers. Say you’re playing music in rooms A, B, and C. You can call tts.service_say on room B and it will speak the announcement only in that room while music continues to play everywhere (including room B, but at reduced volume behind the announcement).

If you want the announcement to play everywhere, just target all those speakers in the same TTS service call.

Basically, you should no longer need to worry about managing snapshots or groups when playing announcements.

1 Like

I think I may have found a serious bug.

When playing using tts.service_say the state of the media player (Sonos in my case) does not change from paused to playing and none of the attributes are updated.

This a serious issue because I rely on one or both of the state and the media_duration to tell me when an announcement is finished.

I have watched the media player state in the Dev Tools while a tts plays using my automations as well as when the service is called directly from the Dev Tools.

image

I have also cross checked using the Sonos app. It only fails to update the state and attributes with tts.

I can confirm this is definitely happening for me and it is only since HA 2023.05 but if I am wrong and you cannot reproduce this, is there anything I can do to troubleshoot?


I can explain why I still need to know when an announcement is finished but it would unnecessarily overcomplicate this post. Needless to say, in some cases need to use snapshot, unjoin, join and restore and cannot make use of the new built-in method of pausing the current media.

This is the exactly the intention of the feature as it separates announcements from “regular” music playback. Announcements will not change the state of the media player playback.

Playing announcements no longer needs to snapshot/join/etc–just play the announcement on the speakers where you want to hear them and any music will continue playing uninterrupted.

Is there a way to adjust the announcement volume? I have tried everything to no avail… This is really important. For the TTS service, I mean.
Thanks!

Just to be clear, I think this (change to tts announcements) is a ‘good thing’.

However it seems like a retrograde step to remove (and effectively show incorrect) information. If the media player is playing surely it’s state should show that, likewise the attributes should reflect the reality?

I could see an argument (which I still wouldn’t agree with) that an announcement over media already playing might not update the state and attributes, but if the player is paused and then starts playing the state is literally wrong.

Wouldn’t it be better to make the announce an option for tts in the same way as it is for play_media? That way you can choose to announce or stick with the old snapshot, unjoin, join, restore.


I also agree with this:

If I’m playing loud music, I don’t want my announcements to be loud, and vice versa if my music is quiet background music, I might want my announcements louder so that I hear them properly.

The state of the media_player should reflect any music currently playing and should not be interrupted to show the overlay sound (which does not stop playback of the music anymore). Having to determine/store/restore the state of the media was honestly a ridiculous workflow, but it was the only way it could be accomplished before. This new method which overlays the audio on top is far simpler and I’d argue what most users would expect to happen if they didn’t know the history.

This is described in the documentation here. You can use the volume parameter to set the volume specifically for the announcement without interfering with currently playing music. For TTS, you’ll need to use a special TTS Media Source URL like so:

service: media_player.play_media
target:
  entity_id: media_player.sonos
data:
  media_content_id: >
    media-source://tts/cloud?message="I am very loud"
  media_content_type: music
  announce: true
  extra:
    volume: 80

This new behavior is only when announce is set. If you’re set on using the old behavior, use TTS like above and leave out the announce parameter.

I think the new function is great if you use TTS via Play_Media it works great. You no longer need complicated processes to sign the announcements. You don’t have to worry about whether or on which players something else is being played.

Try out the function, then a lot becomes clearer, it was the same with me. For me, only one way is missing to recognize when playing an announcement in order to be able to play several rear side without cutting off.

Often you want to play signal and then play a message, which is difficult if you don’t have an event what the end of every announcement reports.

Yes I completely agree!
I was an early adopter of your integration and I liked it a lot.

I do have cases where (at the moment) I will still need to use snapshot etc. and read the state and attributes but I won’t go into that now.

I need to look properly into your reply, I think it may well at the very least provide me with workarounds until I decide to try and rewrite my routines to suit the new behaviours.

Thanks for the replies.

Ok, I’ve thought this through a bit more and I have an issue which I don’t think can be covered.
How can we deal with ‘announcement clash’?

If an announcement is made and before it is finished another one is requested the first will, I believe be cut short when the second one starts.

This was always a problem in the past but was worked around by checking for playing, paused or idle or by using media_duration in a wait.


I used this technique to make long messages sound a lot more natural by imposing a pause between sections.

Specifically, for example I have a morning greeting that has many parts…

Weather
News headlines
Bins and other reminders
Whether Arsenal have a match today (:slight_smile: )
etc. etc.

There is no way that I know of (please correct me if I am wrong!) to create a pause in a TTS announcement, so as my script combines all the sections of the announcement it inserts a token where I want a pause. The script splits the message at the tokens and plays each one in a separate tts call with a delay between them equal to the number of ‘pause tokens’ inserted at that point.

e.g.
“This is the weather.[PAUSE]This is the news.[PAUSE][PAUSE]This is the bin reminder.”

Would give a one second pause between the weather and the news, and a two second pause between the weather and the bin reminder.

The script has a wait between announcements for the previous one to finish.

I know this may sound over engineered but when you hear the announcement every day it makes a difference when it sounds more natural. Without the pauses it is just a constant stream of talking.

I don’t believe this is possible any longer.

PS. Yes I could do away with my overengineered pauses and have separate announcements but that doesn’t solve the problem that there is no way of knowing when it is ‘safe’ to start the next one.

Unless of course I have missed something fundamental?