Using Music Assistant tts Announcement to play ai image recognition task response text?

Hello,
I have an automation to take a camera snapshot, send it to the ai task for analysis, and then (the bit I can’t get to work) play it as a tts announcement via music assistant.
Here is part of the yaml


   - action: ai_task.generate_data
    metadata: {}
    data:
      instructions: >-
        You are English actor Sir John Mills. Briefly describe what you see in
        this image from my frontdoor camera. Don't describe stationary objects,
        cars or buildings.
      attachments:
        media_content_id: media-source://media_source/local/camera/bike_shed_snapshot.jpg
        media_content_type: image/jpeg
        metadata:
          title: bike_shed_snapshot.jpg
          thumbnail: null
          media_class: image
          children_media_class: null
          navigateIds:
            - {}
            - media_content_type: app
              media_content_id: media-source://media_source
            - media_content_type: ""
              media_content_id: media-source://media_source/local/camera
      task_name: Analyse camera image
      entity_id: ai_task.google_ai_task
    response_variable: response
  - action: music_assistant.play_announcement
    metadata: {}
    data:
      announcement_text: "{{ response['data'] }}"
      start_volume: 0.03
    enabled: true
    target:
      entity_id: media_player.mopidy_http_server_on_portadyne_6680_2
mode: single

However in the traces I get an error…

Executed: 27 October 2025 at 17:26:17
Error: extra keys not allowed @ data['announcement_text']**
Result:
params:
  domain: music_assistant
  service: play_announcement
  service_data:
    announcement_text: >-
      Well, hello there! From what I can make out in this rather dim light, it
      appears there's a bicycle, or perhaps just a part of one, peeking over the
      fence with a rather distinctive blue glint about it. Quite the late hour
      for it to be out and about, eh?
    start_volume: 0.03
    entity_id:
      - media_player.mopidy_http_server_on_portadyne_6680_2
  target:
    entity_id:
      - media_player.mopidy_http_server_on_portadyne_6680_2
running_script: false

It looks like the text is generated correctly.
Using… announcement_text: “{{ response[‘data’] }}”
but there’s something i’m doing here that music assistant doesn’t like in the formatting…

Does anyone have an idea?

Thank you,

Matthew

It seems likely that what you are doing wrong is not fact-checking an LLM bullshit engine…

The play_announcement action is not a text-to-speach generating action and announcement_text is not a valid configuration variable for that action. This action plays an audio file that is accessible via the URL you provide. All of it’s valid configuration data variables can be seen in the Action Tool:

or the Automation Editor:

or the Music Assistant Docs for it’s Home Assistant integration:

or the Home Assistant docs for the Music Assistant integration:

Thank you for your reply. Ok i’m being a numpty trying to do something music assistant can’t do. Is it possible to take the ai response[‘data’] and play it via tts through a media player in some way? Thank you

Ok I got it working with tts.speak The only downside being it doesn’t pause the music and resume as Music assistant does. The key bit I couldn’t easily see elsewhere was how to access the returned text.
message: “{{ response[‘data’] }}”

action: tts.speak
metadata: {}
data:
  cache: true
  media_player_entity_id: media_player.mopidy_http_server_on_portadyne_6680
  message: "{{ response['data'] }}"
target:
  entity_id: tts.piper

The full automation looks like this. It’s been done before. The difference now is I needed to change the ai to use the new ai task. The John Mills voice is quite verbose


It uses the free gemini as the ai agent and I have a Reolink camera which integrates really well with HA. I found the built in Person detect and Animal detect worked well.

alias: Camera - Bike Shed ai analysis
description: ""
triggers:
  - type: turned_on
    device_id: your camera id will be here
    entity_id: your entity id will be here
    domain: binary_sensor
    trigger: device
    id: Motion detect start
  - type: turned_on
    device_id: your camera id will be here
    entity_id: your entity id will be here
    domain: binary_sensor
    trigger: device
conditions:
  - condition: time
    after: "07:00:00"
    before: "23:00:00"
    weekday:
      - mon
      - tue
      - wed
      - thu
      - fri
      - sat
      - sun
actions:
  - action: camera.snapshot
    metadata: {}
    data:
      filename: /media/camera/bike_shed_snapshot.jpg
    target:
      device_id: your camera id will be here
  - delay:
      hours: 0
      minutes: 0
      seconds: 5
      milliseconds: 0
  - action: ai_task.generate_data
    metadata: {}
    data:
      instructions: >-
        You are English actor Sir John Mills. Briefly describe what you see in
        this image from my frontdoor camera. Don't describe stationary objects,
        cars or buildings.
      attachments:
        media_content_id: media-source://media_source/local/camera/bike_shed_snapshot.jpg
        media_content_type: image/jpeg
        metadata:
          title: bike_shed_snapshot.jpg
          thumbnail: null
          media_class: image
          children_media_class: null
          navigateIds:
            - {}
            - media_content_type: app
              media_content_id: media-source://media_source
            - media_content_type: ""
              media_content_id: media-source://media_source/local/camera
      task_name: Analyse camera image
      entity_id: ai_task.google_ai_task
    response_variable: response
  - action: tts.speak
    metadata: {}
    data:
      cache: true
      media_player_entity_id: media_player.mopidy_http_server_on_portadyne_6680
      message: "{{ response['data'] }}"
    target:
      entity_id: tts.piper
mode: single