Hello,
I have an automation to take a camera snapshot, send it to the ai task for analysis, and then (the bit I can’t get to work) play it as a tts announcement via music assistant.
Here is part of the yaml
- action: ai_task.generate_data
metadata: {}
data:
instructions: >-
You are English actor Sir John Mills. Briefly describe what you see in
this image from my frontdoor camera. Don't describe stationary objects,
cars or buildings.
attachments:
media_content_id: media-source://media_source/local/camera/bike_shed_snapshot.jpg
media_content_type: image/jpeg
metadata:
title: bike_shed_snapshot.jpg
thumbnail: null
media_class: image
children_media_class: null
navigateIds:
- {}
- media_content_type: app
media_content_id: media-source://media_source
- media_content_type: ""
media_content_id: media-source://media_source/local/camera
task_name: Analyse camera image
entity_id: ai_task.google_ai_task
response_variable: response
- action: music_assistant.play_announcement
metadata: {}
data:
announcement_text: "{{ response['data'] }}"
start_volume: 0.03
enabled: true
target:
entity_id: media_player.mopidy_http_server_on_portadyne_6680_2
mode: single
However in the traces I get an error…
Executed: 27 October 2025 at 17:26:17
Error: extra keys not allowed @ data['announcement_text']**
Result:
params:
domain: music_assistant
service: play_announcement
service_data:
announcement_text: >-
Well, hello there! From what I can make out in this rather dim light, it
appears there's a bicycle, or perhaps just a part of one, peeking over the
fence with a rather distinctive blue glint about it. Quite the late hour
for it to be out and about, eh?
start_volume: 0.03
entity_id:
- media_player.mopidy_http_server_on_portadyne_6680_2
target:
entity_id:
- media_player.mopidy_http_server_on_portadyne_6680_2
running_script: false
It looks like the text is generated correctly.
Using… announcement_text: “{{ response[‘data’] }}”
but there’s something i’m doing here that music assistant doesn’t like in the formatting…
Does anyone have an idea?
Thank you,
Matthew
It seems likely that what you are doing wrong is not fact-checking an LLM bullshit engine…
The play_announcement action is not a text-to-speach generating action and announcement_text is not a valid configuration variable for that action. This action plays an audio file that is accessible via the URL you provide. All of it’s valid configuration data variables can be seen in the Action Tool:
or the Automation Editor:
or the Music Assistant Docs for it’s Home Assistant integration:
or the Home Assistant docs for the Music Assistant integration:
Thank you for your reply. Ok i’m being a numpty trying to do something music assistant can’t do. Is it possible to take the ai response[‘data’] and play it via tts through a media player in some way? Thank you
Ok I got it working with tts.speak The only downside being it doesn’t pause the music and resume as Music assistant does. The key bit I couldn’t easily see elsewhere was how to access the returned text.
message: “{{ response[‘data’] }}”
action: tts.speak
metadata: {}
data:
cache: true
media_player_entity_id: media_player.mopidy_http_server_on_portadyne_6680
message: "{{ response['data'] }}"
target:
entity_id: tts.piper
The full automation looks like this. It’s been done before. The difference now is I needed to change the ai to use the new ai task. The John Mills voice is quite verbose
It uses the free gemini as the ai agent and I have a Reolink camera which integrates really well with HA. I found the built in Person detect and Animal detect worked well.
alias: Camera - Bike Shed ai analysis
description: ""
triggers:
- type: turned_on
device_id: your camera id will be here
entity_id: your entity id will be here
domain: binary_sensor
trigger: device
id: Motion detect start
- type: turned_on
device_id: your camera id will be here
entity_id: your entity id will be here
domain: binary_sensor
trigger: device
conditions:
- condition: time
after: "07:00:00"
before: "23:00:00"
weekday:
- mon
- tue
- wed
- thu
- fri
- sat
- sun
actions:
- action: camera.snapshot
metadata: {}
data:
filename: /media/camera/bike_shed_snapshot.jpg
target:
device_id: your camera id will be here
- delay:
hours: 0
minutes: 0
seconds: 5
milliseconds: 0
- action: ai_task.generate_data
metadata: {}
data:
instructions: >-
You are English actor Sir John Mills. Briefly describe what you see in
this image from my frontdoor camera. Don't describe stationary objects,
cars or buildings.
attachments:
media_content_id: media-source://media_source/local/camera/bike_shed_snapshot.jpg
media_content_type: image/jpeg
metadata:
title: bike_shed_snapshot.jpg
thumbnail: null
media_class: image
children_media_class: null
navigateIds:
- {}
- media_content_type: app
media_content_id: media-source://media_source
- media_content_type: ""
media_content_id: media-source://media_source/local/camera
task_name: Analyse camera image
entity_id: ai_task.google_ai_task
response_variable: response
- action: tts.speak
metadata: {}
data:
cache: true
media_player_entity_id: media_player.mopidy_http_server_on_portadyne_6680
message: "{{ response['data'] }}"
target:
entity_id: tts.piper
mode: single