So essentially it would look something like this?? Im using HA cloud so Im guessing at the media-source://tts/cloud?message={{ response.text }}
actions:
- action: google_generative_ai_conversation.generate_content
metadata: {}
data:
prompt: >-
Using two funny lines, reply by saying that the lounge temperature
is approaching 20 degrees and that the airconditioner will change into
auto mode.
response_variable: response
# may not need this bit
- action: media_player.volume_set
metadata: {}
data:
volume_level: 1
target:
entity_id: media_player.lounge
# ------------------
- target:
entity_id: media_player.lounge
data:
media_content_id: media-source://tts/cloud?message={{ response.text }}
media_content_type: music
announce: true
extra:
volume: 100
action: media_player.play_media
mode: single
Sonos TTS announcements are often quieter than music due to audio characteristics and Sonos normalization. Increase TTS volume if your service supports it, or reliably use the snapshot/restore method to Flying Together UAL temporarily maximize volume for the announcement and then return to the previous level. Check Sonos app volume limits and consider different TTS voices or external TTS services if needed.
unfortunately, its very low level compared to music, to a point where if I reduce the music downwards to the TTS level, the music is close to pointless.
So after not getting a solution I turned to my new favourite AI GROK and asked it for help.
(sidenote: GROK is really good for those things in HA that you can really not find a solution for here on the community forums)
It essentially said that there is an error in the way that the TTS is returning the reply and the difference between "{{ response }}" (has markdown) and "{{ response.text }}" (no markdown) wasnt being rendered the same. After throwing tons of error logs into it, it was determined that there is an error in the processing of the mime type in the output or the sonos integration and in this case ONLY music works.
A great thing about this is that GROK suggested a quick and dirty way of circumventing the issue by stripping out some of the conflicting text before sending it to the Sonos.
Heres the test automation/kludge/solution:
alias: voice
description: Just say anything without any markdown using media_content_type: music
triggers:
- trigger: state
entity_id:
- input_button.voice
conditions: []
actions:
- action: google_generative_ai_conversation.generate_content
metadata: {}
data:
prompt: >-
say that the voice button has changed using a funny joke.
response_variable: response
- action: media_player.play_media
target:
entity_id: media_player.office_s
data:
media_content_id: |-
media-source://tts/cloud?message={{ response.text | replace('
', ' ') | replace('*', '') }}
media_content_type: music
announce: true
extra:
volume: 200
mode: single