Playing announcements over Sonos music?

I’ve been trying to find a solution for this now for a while, and can’t seem to get things to work as I’d like and I’m hoping someone can help me out.

I’d like to use HA to play voice announcements over my Sonos speakers, some of which are Alexa enabled. I can get the basics of this to work, the problem comes into a few things: currently playing music, volumes as well as grouping. Most of the time in my house, there is music playing. The kids always have a couple speakers grouped together playing something. If they’re blasting Disney, using service.cloud_say to the kitchen, will end up blasting the announcement at an insane volume possibly on multiple speakers through the house.

To try and combat this issue I’ve tried taking a snapshot, pausing the speakers I want to play it on, joining those speakers, setting the volume, playing the cloud_say, then restoring the snapshot. This has a couple issues. First, the snapshots don’t seem to consistently work very well, sometimes they don’t restore properly. It’s also very jumpy and abrupt. The tracks will “rewind” to where they were a few seconds before the announcement started.

I’ve attempted to use the HACS alexa media player integration as well with no added help.In an ideal scenario what I’d like is to be able to just use an “announce” feature over the speakers I want to play it on without grouping or ungrouping. The music would lower and the person in the box would announce, then raise the volume back as it does with other features.

Has anyone solved this in a great way? I’d love to hear your ideas and solutions!

You might be looking for this

jjlawren/sonos_cloud: Sonos cloud API integration for Home Assistant with improved TTS/alerts handling (github.com)

There is a forum post go with it somewhere…

As far as I know, the ideal scenario you described doesn’t exist; tts.cloud_say is as close as you will get to a convenient voiceover.

FWIW, I don’t use that service call (HA instance is inaccessible from the internet) so I pause whatever is currently playing (if anything), snapshot/pause/unjoin/join/set volume/play the announcement/restore snapshot. So, no voiceover, just the usual ‘butt-in’ along with all the rigmarole of juggling groups.

As you already know, the unjoining/joining isn’t instantaneous so I implemented two strategies to avoid it when possible.

  1. I check the current grouping and if it matches the desired grouping, I skip the unjoin/join step. So if the goal is to play the announcement to Downstairs speakers, if they’re already grouped that way I leave them as-is.
  2. I provide the ability to play an announcement in what I call “opportunistic mode”. If the announcement is to be played Downstairs and there are currently one or more speakers playing something downstairs, then don’t unjoin/join but simply play to the existing grouping. It’s sort of a “good enough” coverage; play to wherever someone is already listening.

I should point out that these two strategies are based on our listening habits where typically only one stream occurs at a time (i.e. Downstairs is not playing something different from Bedroom). The strategies can probably be adapted to handle that situation but I haven’t explored it (yet).


EDIT

Correction. tts.cloud_say not service.cloud_say.

Welcome!

I am a neophyte noob in the Sonos space (then again probably in all ‘space’ :wink: ) So you are probably beyond this when you say you are taking a ‘snapshot’. That said, if you have not yet done so, you might poke around with these two external libraries to see if they offer a more robust way to save and restore the state of your Sonos environment. Good hunting!

I believe you may be referring to this one:

That’s the custom integration developed by jjlawren that implements the audioclip feature using Sonos’ Cloud API. In other words, it’s the integration that provides the tts.cloud_say service call.

Over music I wouldn’t know, but I myself use Node Red to snapshot the music and it’s state being played, send a tts message and then resume the music again without any issues occurring as mentioned in the original post. Creating a snapshot of the state of the sonos device means it will resume exactly where it was stopped, but also if it wasn’t playing the snapped state is ‘pauzed’ and not playing. I guess the magic is in the Sonos node-red node that is handling this?

This is depended on you using node-red. I don’t know of a way to do it in HA own automations.

It does require adding some specific Sonos nodes to your Node Red Pallet, they do not come with the default install of Node Red.

Snap the state, play voice-message, wait, return to state snapped

Snapping the state
Screenshot 2022-02-22 at 23.06.22

Returning to the snapped state
Screenshot 2022-02-22 at 23.06.39

The Universal Node from this set of nodes should be added to your pallet.

It’s done with the sonos.snapshot service call. It records the grouping of speakers, their individual volume levels, and the current position of the media content that being played (if any). All of that is restored after calling sonos.restore.

If the content is locally sourced or from a cloud provider, the content picks up from where it was paused. However, if the content is from a cloud provider like internet radio, the restored content will be from its current position, not from where it was paused. The reason is simply because internet radio doesn’t let you “rewind” to a specific point in the stream (or replay content).

Overall, my experience with sonos.snapshot and sonos.restore has been fine.

1 Like

Same here.

I presume this is what the node-red node I referred too probably also uses. And makes sense that you cannot pauze online content this way. I don’t see a way how that could work differently.

Thanks everyone for all the help!

I tried the community addon and couldn’t get it working quickly. I didn’t spend too much time as I’d prefer to avoid going outside core for this functionality.

I’ve been working on a script and it’s getting me pretty far, but has at least one current issue.

alias: Announce on speakers
fields:
  message:
    description: The text to announce
    example: Welcome home!
  media_players:
    description: A single entity or list of media_players
    example: '[media_player.kitchen, media_player.dining_room]'
  volume:
    description: The volume to use, the values are 0.01 to 0.99
    example: '0.31'
    default: 0.31
sequence:
  - service: sonos.snapshot
    data:
      entity_id: '{{  media_players }}'
      with_group: true
  - service: media_player.media_pause
    data: {}
    target:
      entity_id: '{{  media_players }}'
  - service: sonos.unjoin
    data: {}
    target:
      entity_id: '{{  media_players }}'
  - service: sonos.join
    data:
      master: '{{ media_players[0] }}'
      entity_id: '{{ media_players }}'
  - service: media_player.volume_set
    data:
      entity_id: '{{ media_players }}'
      volume_level: '{{ iif(is_number(volume), volume, 0.31) }}'
  - service: tts.cloud_say
    data:
      entity_id: '{{  media_players }}'
      message: '{{ message }}'
  - delay:
      hours: 0
      minutes: 0
      seconds: |
        {{ message | length | float/2 | float * 0.26 }}
      milliseconds: 0
  - service: sonos.restore
    data:
      entity_id: '{{  media_players }}'
mode: single
icon: mdi:bullhorn

There is a delay in firing, but that’s expected. The one bug can happen like this:

  • Device B is playing a song
  • Call script to announce over A, B, C
  • When the script finishes, B will not restart a song.

I’m guessing this is b/c the restore is primarily based around A. If A (or even A and a group) is playing, this all works fine. I wonder if I should snapshot and restore each player in the list?

Not sure if this is the issue but you don’t need to pause the Sonos

Also…
Is this an estimate of how long the TTS takes to say?

  - delay:
      hours: 0
      minutes: 0
      seconds: |
        {{ message | length | float/2 | float * 0.26 }}
      milliseconds: 0

If so is it fairly accurate?

I use wait. First I wait for the master media_player to be playing and then wait for it to be paused. I do have a timeout because in rare cases I have found that hasn’t been 100% reliable (although not for a long time now so maybe over time the Sonos integration has improved?).
But if your method is a good estimate then I might steal it!

If I don’t pause, but I set the volume level, I risk blasting the music out an unreasonable volume. Pausing is just a safety feature.

If so is it fairly accurate?

Exactly! It’s been working really well in my experience. Characters don’t accurately translate to how long it takes to say something, but I’ve been using it for a while and it’s worked with a wide variety of messages without too much pause after playing.

I’d suggest giving the sonos_cloud custom integration linked above another shot as it seems like it would solve most (if not all) of the edge cases you’re hitting. For example, there is no need to worry about restoring the state of streaming music (as it never really stops) and you can set the volume on a per-alert basis. It would also trim down the script you’ve posted (or perhaps eliminate the need for it altogether).

You’re right that it’s external to core for now, but a point of reference I currently maintain both the core sonos integration and the custom sonos_cloud one. They’re simply very different methods to work with Sonos speakers and it was easier to get the feature out to users this way. Long-term I hope it makes sense to merge them together into core, but that’s not ready today.

I did something similar:

  • Had helpers to store the current volume level of each of my sonos speakers
  • Created a snapshot of each speaker
  • Set volume to a preset ‘announcement’ volume
  • Pause the speakers
  • Played a locally stored mp3 - which was a recording of a tts service (so didnt use alexa)
  • Restored volume from helpers
  • Restored snapshots

I moved away from this primarily because snapshot/restore didnt work when using the sonos speakers in ‘casting’ mode from the spotify app, which is something we do a lot. Also of course the pre-recorded messages may not work for you, though a benefit is they play a bit quicker than Alexa TTS.

Ive got a few echos dotted around so now I use alexa.notify, and only play announcement on there - but I still have elements of my previous solution in that I ‘duck’ the Sonos speakers volume to a very low one for the duration of the announcement. I’ve created a script with variables - one is the text, the other is the duration of the announcement in seconds - to make this reusable.

Thanks for the push @jjlawren! I backed up everything and spent some time setting up the sonos_cloud integration and it’s working great! Thanks for your help and your work on the integration. The additional entities are strange at first, but I really like the abiltiy to have the alert volume saved on the entity away from the normal media player.

1 Like