Tts.speak and tts.home_assistant_cloud volume issues {{ response.text }}

Im using a Sonos media player for my announcements in my place and find that the ha cloud responses are LOW in level compared to playing music.

The volume of the media player is already set to max

Any way of bumping it up???

  - action: media_player.volume_set
    metadata: {}
    data:
      volume_level: 1
    target:
      entity_id: media_player.lounge
  - action: tts.speak
    metadata: {}
    data:
      cache: false
      media_player_entity_id: media_player.lounge
      message: "{{ response.text }}"
    target:
      entity_id: tts.home_assistant_cloud

I don’t think tts.speak supports volume control. However, if you use media_player.play_media instead, it seems to work on Sonos.

      - target:
          entity_id: media_player.lounge
        data:
          announce: true
          media_content_id: |
            media-source://tts/amazon_polly?message="{{ response.text }}"
          media_content_type: music
          extra:
            volume: 50
        action: media_player.play_media

You’ll need to replace “amazon_polly” with the name of your speech engine.

So essentially it would look something like this?? Im using HA cloud so Im guessing at the media-source://tts/cloud?message={{ response.text }}

actions:
  - action: google_generative_ai_conversation.generate_content
    metadata: {}
    data:
      prompt: >-
        Using two funny lines, reply by saying that the lounge temperature 
        is approaching 20 degrees and that the airconditioner will change into 
        auto mode.
    response_variable: response
# may not need this bit
  - action: media_player.volume_set
    metadata: {}
    data:
      volume_level: 1
    target:
      entity_id: media_player.lounge
# ------------------
  - target:
      entity_id: media_player.lounge
    data:
      media_content_id: media-source://tts/cloud?message={{ response.text }}
      media_content_type: music
      announce: true
      extra:
        volume: 100
    action: media_player.play_media
mode: single

Sonos TTS announcements are often quieter than music due to audio characteristics and Sonos normalization. Increase TTS volume if your service supports it, or reliably use the snapshot/restore method to Flying Together UAL temporarily maximize volume for the announcement and then return to the previous level. Check Sonos app volume limits and consider different TTS voices or external TTS services if needed.

unfortunately, its very low level compared to music, to a point where if I reduce the music downwards to the TTS level, the music is close to pointless.

Can someone comment on the code above???

When I do

action: media_player.play_media
target:
  entity_id: media_player.lounge
data:
  media_content_id: media-source://tts/cloud?message="{{ response }}"
  media_content_type: music
  announce: true
  extra:
    volume: 100

I get the unformatted text announced as a response from HA cloud, but if i do this:

action: media_player.play_media
target:
  entity_id: media_player.lounge
data:
  media_content_id: media-source://tts/cloud?message="{{ response.text }}"
  media_content_type: music
  announce: true
  extra:
    volume: 100

I get nothing :slightly_frowning_face:

In the logs i get:

Error calling SonosMediaPlayerEntity._play_media on media_player.lounge: UPnP Error 714 received: Illegal MIME-Type from 19X.XXX.XXX

Ive tried media_content_type: provider and tts - both throw the same error

So after not getting a solution I turned to my new favourite AI GROK and asked it for help.

(sidenote: GROK is really good for those things in HA that you can really not find a solution for here on the community forums)

It essentially said that there is an error in the way that the TTS is returning the reply and the difference between "{{ response }}" (has markdown) and "{{ response.text }}" (no markdown) wasnt being rendered the same. After throwing tons of error logs into it, it was determined that there is an error in the processing of the mime type in the output or the sonos integration and in this case ONLY music works.

A great thing about this is that GROK suggested a quick and dirty way of circumventing the issue by stripping out some of the conflicting text before sending it to the Sonos.

Heres the test automation/kludge/solution:

alias: voice
description: Just say anything without any markdown using media_content_type: music
triggers:
  - trigger: state
    entity_id:
      - input_button.voice
conditions: []
actions:
  - action: google_generative_ai_conversation.generate_content
    metadata: {}
    data:
      prompt: >-
        say that the voice button has changed using a funny joke. 
    response_variable: response
  - action: media_player.play_media
    target:
      entity_id: media_player.office_s
    data:
      media_content_id: |-
        media-source://tts/cloud?message={{ response.text | replace('
        ', ' ') | replace('*', '') }}
      media_content_type: music
      announce: true
      extra:
        volume: 200
mode: single