Okay, after a lot of try and error I made some progress. This is going to be verbose, but i figured it’s better to be as detailed as possible and not just throw around some random snippets without any context.
Documentation is lacking :-/
Unfortunately the documentation of the whole Intent part is very very lacking and it is near impossible to get something working by the sparse documentation alone - without reading lots of posts and trying to make any sense out of it without actually understanding what things really do, I wouldn’t have gotten anywhere. I really wish someone who really understands the Intent pipeline would make a better documentation.
Prerequisites
So this is my starting point:
- Working Assist pipeline in general for built-in commands
- Installed Music Assistant and configured it to use my Voice PE as speaker
- Installed HA Music Assistant Integration (not the HACS one)
- Installed addon “File Editor”, so you can edit/add configuration files
- Just for reference, I’m a german speaker, so you might need to change “
de
” to your country code in folder names and scripts and update the sentences used for matching the intent
Set up intent
- Open up the file editor
- Create folder
custom_sentences
if it does not exist
- Create subfolder
de
within custom_sentences
- Create a new file to hold my intent - I used the one from the MA sample, which is:
music_assistant_PlayMediaOnMediaPlayer.yaml
The structure should then be:
custom_sentences/de/music_assistant_PlayMediaOnMediaPlayer.yaml
- Add a minimal intent configuration to that file (I will extend and post the complete intent once it is working):
./custom_sentences/de/music_assistant_PlayMediaOnMediaPlayer.yaml
language: "de"
intents:
MassPlayMediaAssist:
data:
- sentences:
- "<play> <track> {track}"
expansion_rules:
play: "((spiel)|(spiele))"
track: "[((der)|(die)|(das)) ](track|song|lied)"
lists:
track:
wildcard: true
This should pick up a voice command like this (and then moves on to the intent_script
below):
"Spiele Song Fly me to the moon" - "Play song Fly me to the moon"
Note: This is a very minimalistic sentence. For this post I’ve removed all the fancies that the example provided by MA has. Funny thing is, with the fancies it didn’t work at all. It was only now that I’ve stripped it down, that it, while writing this, it actually worked for the first time at all. The regular sentence was like this: "<play> <track> {track} [von <artist> {artist}] [((mit)|(im)) {radio_mode}]"
Verify if intent is picked up
If this works, you can check in the HA GUI using the dev tools and there the tab “Assist”.
If you enter into the parser “Spiele Song Fly me to the moon” it should output this:
intent:
name: MassPlayMediaAssist
slots:
track: Fly Me To The Moon
details:
track:
name: track
value: Fly Me To The Moon
text: Fly Me To The Moon
targets: {}
match: true
sentence_template: <play> <track> {track}
unmatched_slots: {}
source: custom
file: de/music_assistant_PlayMediaOnMediaPlayer.yaml
It is super important that match: true
, otherwise, while it has applied our new intent to parse the command, it was not successful in matching one of the sentences or its parameters (don’t know exactly what it didn’t match).
Response
We need to set up a response for our intent next, which will be used to confirm our command using voice output.
- Open file editor
- Go to
custom_sentences/de/
- Create new file
responses.yaml
- Add the following and save file
./custom_sentences/de/response.yaml
language: "de"
responses:
intents:
MassPlayMediaAssist:
default: "Okay"
This will just confirm our command with “Okay”, which probably can be improved later, if that is desired.
Intent Script
So now that we have successfully picked up the voice command, we need to add an matching intent_script
to pass on the data to a service. The intent_script
should have the same name as the intent, which in our case is MassPlayMediaAssist
, as in our intent above.
- Again open up the file editor
- Edit the
configuration.yaml
file and add the following
./configuration.yaml
intent_script:
MassPlayMediaAssist:
action:
- service: music_assistant.play_media
target:
entity_id: media_player.home_assistant_voice_091d31_media_player_2
data:
enqueue: replace
media_id: "{{ track }}"
#media_id: Punk Rock Song
media_type: track
radio_mode: false
Two notes here:
-
You will need to edit the entity_id
and provide it the one of the speaker you wish to use. That probably can be improved later, but we need a bare minimal setup that actually works, before we can move on to more complex things.
-
There is a commented-out media_id
which I’ve used to check if the playback works in general, no matter what data the intent
passes on to the script. With the more complex sentences I have the issue that the track
isn’t provided for some unknown reason and the intent_script
throws an error with the voice saying “An unknown error has occurred”. I then used a static media_id
to check if it would work if the track
would be set and containing valid data. If it doesn’t, that needs to be fixed first.
Reload configurations
Now we need to apply our changes to the yaml configuration files.
I’m not sure if you need to reboot when adding new files, but maybe do that once.
Then you can usually go to the dev tools, tab “YAML” and first check the configuration in the first block and then, if all is good, reload all YAML configs in the second block. (If you know exactly which ones to reload, you can selectively reload them, but I don’t know which, so I hit “All”).
Test it!
Now test it by saying the magic words and make sure you have a matching song. *fingers crossed*
Flow overview
The process in general is somewhat like this:
- Speak a command
- Some magic happens and if things go well our intent is picked to process the command (probably by the configured sentence patterns in the intent)
- Our Intent picks up voice command and tries to match one of the configured sentences and extracts parts of the sentence as our data (track).
- Data is handed over to the
intent_script
along with the matched data (track)
intent_script
maps our data to specific arguments and calls defined service with those arguments
- If things go well, the
response
is called, which outputs a voice confirmation and then the song is played
Across the intent
, intent_script
and responses
, our configurations are linked by it’s common name, which here in this case is MassPlayMediaAssist
- but it can be pretty much anything that is not in use already.
What’s next?
This is super minimalistic, but it works and should get you started.
- Next are improvements to the
intent
so that more complex patterns/sentences can actually be matched without things go boom.
- The
entity_id
of the media player should be made dynamic and not static
- The response could contain what actually is going to be played next, but that’s up to you.
- It would be nice to have better error handling and better debugging - if anybody knows more, I’m all ears.
- In the same sense, it would be nice to have a dedicated output in case the track could not be found, but otherwise everything else was okay.
- I really need a way to debug an
intent_script
as this is what is usually going wrong - namely the slots
exceptions, where I just can’t find out what exactly went wrong with it, but it indicates invalid data being passed to the intent_script
or that the mapping is incorrect (like incorrect type).
- In the intent script, how do I handle the different sentences from the intent, which provides different datasets? Right now it can only play a track and that type is hardcoded, but how do I make it more flexible to e.g. play something from an artist?
Issues
That’s it. Hope that is useful to someone and please share if you have improved on this.