A Better (and Simpler) LLM Music Assistant Script

I’ve been struggling getting my voice assistants to working properly with Music Assistant for a while. There us an LLM script floating around the kind of worked, but I had to mess around a lot with the prompt and even after a lot of tweaking the results were very inconsistent. Partly this was due to there being too many parameters that were just too complicated for the LLM to correctly assign (in fact I struggled to do it manually just to test it).

After a lot of messing around, I’ve come up with a much simpler script that I have almost 100% success rate with.

alias: LLM Universal Music Player
description: Plays any music request and explicitly sets shuffle mode
fields:
  query:
    description: Artist, song, album, or playlist name
    example: Pulp
  area_name:
    description: The area where the music should play
    example: Kitchen
  shuffle:
    description: True to shuffle, False to play in order.  Defaults to false
    example: false
sequence:
  - variables:
      target_area: "{{ area_name | area_id }}"
  - sequence:
      - action: media_player.shuffle_set
        target:
          area_id: "{{ target_area }}"
        data:
          shuffle: "{{ bool(shuffle, false) }}"
      - action: music_assistant.play_media
        target:
          area_id: "{{ target_area }}"
        data:
          media_id: "{{ query }}"
          enqueue: replace
mode: single

In my system prompt (Ollama integration) I also have the following:

## Playing music

To play music use this tool: llm_universal_music_player.

If I ask for a specific area, pass in that area, otherwise pass in the area that you are located in.  Pass in the name of the area using the area_name parameter.

For the query parameter, put either the Song Name, Artist Name, Album Name or Playlist Name.  If I just ask to play some music with no other requirements, set the query to "500 Random Tracks" and pass in shuffle=true.

If I specifically ask you to shuffle, set that field to true, otherwise leave it as false.

## Stopping Music

There is no way to "stop" the music per se.  Instead, if I ask you stop the music, pause it.  You can still tell me you stopped the music, because the effect is the same.

The reasons that this works better:

  • No complicated entity IDs. It just uses the area which is much simpler
  • No media type - it doesn’t get confused and pass in album when it should be a track etc.

I spent a long time messing with this and hopefully this can save someone some time!

Edits:

  • changed from “slugify” to “area_id” when converting from area name to id

The Music Assistant LLM script indeed needs a quite large model. It works fine with cloud models like ChatGPT and Gemini, but smaller local models struggle with it.

I expect your script will not work for everyone, especially when they have been working with HA for a long time. Initially area_id’s were just random id’s not based on the area name, so slugify won’t work. And even with more recent installs it can happen that an area name is changed after creation, but the area_id will remain the same.

To avoid issues with that, you could use area_id(area_name) instead.
You could also use an area selector for that field, which should force the LLM to use the actual existing area_id, but also there especially the smaller LLM seem to struggle with the restrictions set by selectors.

And wouldn’t it be better to add the information specific for this script to the script itself? Now you are sending this information with every command issued.

Sorry, I don’t understand this part.

1 Like

This information is specific information for usage of the script. If you add it to the script description, or the description of the fields, it will only be sent to the LLM when the script is used, on not when you turn on the office light for example.

2 Likes

I understand. This is all great feedback by the way - this is one of the reasons I wanted to post this here!

The area_id thing worked perfectly - this is a much better way of doing it. I will test moving some of the system prompt stuff to the description.

Once I’ve improved it, should I edit my original post or add a reply with the new version? What do you think is better?

I would think it’s best if the most recent version is in the start post, so people don’t have to look for it.

1 Like

I’ve tested it on:

  • ministral-3:3b-instruct-2512-q4_K_M
  • qwen3:4b-instruct-2507-q4_K_M

And it worked perfectly.

Functiongemma 270m was too stupid to even call the script.

But I think this will work well on any practical model with tool calling. I will experiment moving more of the text from the system prompt into the description later today and update it if that works well.

I think the trick is using the area name instead of the devices - the llm often gets confused if the entity and device don’t have the same name with the default script. And just making it simpler!

I don’t think this is true.

I have installed something to log what gets sent to the LLM for every request, and in every single request the following is added (along with every single other tool):

{
      "function": {
        "description": "Plays any music request and explicitly sets shuffle mode",
        "name": "llm_universal_music_player",
        "parameters": {
          "properties": {
            "area_name": {
              "description": "The area where the music should play",
              "type": "string"
            },
            "query": {
              "description": "Artist, song, album, or playlist name",
              "type": "string"
            },
            "shuffle": {
              "description": "True to shuffle, False to play in order.  Defaults to false",
              "type": "string"
            }
          },
          "required": [],
          "type": "object"
        }
      },
      "type": "function"
    }

The way it works, at least for the Ollama integration is that the system prompt and all of the tool definitions are sent for every single request.

If the LLM wants to make a tool call, it will respond by setting the tool call in its response like this:

{
  "model": "qwen3-32b-q4_k_m.gguf",
  "created_at": "2026-02-03T11:03:50.270390734Z",
  "message": {
    "role": "assistant",
    "content": "",
    "tool_calls": [
      {
        "function": {
          "arguments": {
            "area_name": "Mummy's Room",
            "query": "500 Random Tracks",
            "shuffle": "true"
          },
          "name": "llm_universal_music_player"
        }
      }
    ]
  },
  "done": false
}

Home assistant will then run the code, and make another request to the LLM with the tool call result appended to the context (still with the entire list of tools).

Finally, the LLM will respond with the user message, i.e. “Playing 500 Random Tracks on Office”.

So it doesn’t matter whether you put it in the description or the system message - it will be sent with every single request either way. This is actually preferable, because it avoids another round trip between Home Assistant and the LLM.

This is somewhat true.

It is sent in the tool description which is sent as part of the prompt when assist is enabled.

So you don’t save tokens but the info is better organized for the llm.

That info should ALL be in the tool descriptions. That said you only get 1024 characters. If you want the details I cover all the tool construct in Fridays party.

Tool desc < 1024 char.
Tool field description <128 char each.

Should be descriptive enough to let the llm use the tool in a vaccum.

You Can and SHOULD make a help function in the tool for extended help that will NOT load in the context. Look at my query tool or index tools they both have help functions (that actually call each other) and make that HELP call the default so if the llm calls it incorrectly or with no modifiers out pops help. With instructions on how to not show help next time…

On ones where help is critical you can put an undocumented flag in and require it true to NOT show help. And the first time the llm calls it poof. Help. :slight_smile: I’m using this trick a LOT now.

Here’s the link to the catch up post which points at my tool repo. Friday's Party: Creating a Private, Agentic AI using Voice Assistant tools - #167 by NathanCu

Edit OT @TheFes may need some help with your trigger text sensors ping me a DM. I have questions sir.

1 Like

I think a lot of this is just preference.

Having it in the system prompt has advantages, like being able to set different assistants to default to a different playlist or have other different behaviour. It doesn’t really affect the number of tokens so I think it’s up to you how you want to organise it.

The help thing is a good idea for a more complicated tool, but I don’t want to do that for the “play music” tool. It adds another round trip between HA and the LLM which will add latency. Also, looking at the logs, most of the time when I ask it to play some music the context is empty so it won’t remember from one time to the next.

Putting it in a help function returned by the tool is just going to add 800ms - 1200ms on my setup that I don’t want. Ideally I just want it to one-shot it and return quickly.

Also I think a lot of these techniques for managing the context are a lot more applicable if you have lots of tools, do lots of complicated things and have a long and complicated system message. If your setup is simpler, it’s easier to manage little tweaks in behaviour by just adjusting the instructions because everything’s all in one place.

I’m basically replacing my Alexas and I only use them to:

  • Play music
  • Turn on / off the lights
  • Tell me the weather
  • Answer simple general knowledge questions

I think sometimes it’s better to just keep things simple.

Putting it in the tool makes it MUCH MICH easier from a construct perspective. Trust me

Help isn’t for common crap. It’s the extended stuff. Like I have a tool for grocy. Help tells the llm EXACTLY how to log something in inventory. But it’s not in context until the llm needs it.

You can automate the prompt. - it’s just a template. But if the tool description is in the prompt too you need to do a lot of work to work around the custom text. (go see the prompt template for Friday to see wht I’m talking about)

Friday's Party: Creating a Private, Agentic AI using Voice Assistant tools (I’ve been working on this problem a long time…)

Future agents will not be able to load every single item inthe prompt on first shot. It’s impossible. Especially local where most people will be fighting context under 16k every token matters.

So how to use the tool I the tool. Extended use of tool in tool help to clear toks and how you need to use the tool in context I. Your prompt (or in my case inthe pre summarizer… Long story) we can do some amazing stuff but it takes prompt context micromanaging and keeping stuff OUT of the prompt when it doesn’t need to be there is gonna be the name of the game for 2026. (cause none of us are going to be buying new ram…)

I don’t think much of this applies to be honest. My prompt is around 5k tokens - there is no struggle to keep it under 16k.

Help text isn’t really applicable - there are just 2 required inputs, query and area_name.

Personally, I like to separate out how the tool works from how this particular assistant should use the tool.

i.e.

How the tool works: pass in an area name and a query, shuffle is optional and a boolean etc.

How this particular assistant should use the tool: default to the current area if I don’t specify one, default to the 500 random tracks playlist if I don’t specify something to play.

It’s not really the case that these things apply in all possible situations, for example when I add another voice assistant in my kitchen, I will want it to default to the dining room for music playback, and one of the other rooms uses the Recently Added playlist instead. It’s kind of hard to manage if they are all in the tool description.

I don’t really understand what this “extra work” I have to do around custom text is. This just seems really simple, easy to manage and easy to understand.

I did look at the Friday Party thing but it’s too long and not in a format that is easy to digest, and talks about a lot of stuff that is way more complicated than what I want to do.

I think if you just want to turn the lights on and play some music you don’t really need to be worrying about all of this complicated stuff. Since I added my system prompt and the script it just works pretty much 100% of the time.

The only thing I want to change is to make it a bit more intelligent when you ask for something like “Play X by Y” because at the moment it passes the whole thing into the query, and I probably need a way to let it add an additional artist parameter. But I’m hesitant to do it because that’s getting closer to the other LLM script which was very difficult to get working properly.

Fair enough but from a purely theoretical and construct best practice pov it belongs in the tool because in a multi agent system (agreed prob not applicable to you - but I DO have to worry about it)

… the instructions in the tool is available to All agents where instructions inthe prompt is only available to the agent in that prompt (create a new conversation integration instance etc.) this is the cut point. If you want a skill to be specialized to a specific agent it goes ONLY in that agent prompt. If you want it universal - the instructions need to be stored ina common location. Plain and simple.