Home Assistant Voice play music commands return action as text that is read rather than don

I’ve been trying to setup Home Assistant Voice using a locally hosted LLM on an Ollama server I’ve setup. I have Assist talking to the LLM and it will return answers to queries. It will also turn on and off devices on my Home Assistant. I’ve tried to get it to play music and running into some weirdness in the responses. (I’ve managed to have it play once, but that was unrecreatable afterwards with no changes.)

I’ll ask Assist to “Play The Wall” and what I get most often back is this:
{“name”: “Script to Play Music”, “parameters”: {“media_type”: “album”, “artist”: “”, “album”: “The Wall”, “media_id”: “”, “media_description”: “”, “shuffle”: true, “area”: “Living Room”}}

The “name” element changes occasionally. Sometimes it will be “HassMediaSearchAndPlay” sometimes it’s a number. Lately it is “Script to Play Music” after I added the context that the LLM should use the script I have by that name to launch music. (Which came from Option 3 here: GitHub - music-assistant/voice-support: Music Assistant blueprints )

Assist’s returned text looks like it should be something passed to an action/intent/tool (sorry, not sure correct nomenclature here) but instead is read aloud as if that’s what I want it to do.

I’m using Home Assistant Cloud for Speech-to-Text and Text-to-Speech. I do have Whisper and Piper installed as well and could swap those, but previously they were not as successful understanding my speech as Home Assistant Cloud. Using Home Assistant Cloud as the conversation processor leads to a bunch of “I don’t understand” response, but I can dig those up as well if that will help here.

Where do I need to poke around? What other info would be helpful in sorting this out?

It’s having context confusion if it’s not consistent.

It may also be your model. What model are you using?

I’m using Llama 3.2. I’ve also tried Mistral and qwen3. Llama was the only one that succeeded (once) and then stopped working and started doing this again.
I’ve tried adding to the default context that the start but it’s not seemed to do much (though telling it to use my script to play music did change the name returned from HassMediaSearchAndPlay to the name of my script.

Is it possible it’s holding on to corrupt context and I need to reset the model. (Either through Ollama or some how in HA?)

This one I’d almost have to see what you’re prompt looks like

Llama3.2 (Is it the 4b?)

No

Id like tk see what you did here especially

So this is my current settings for the conversation

I’ll have to see what I had tried to add before to convince it to use the script as I’ve deleted that.

Here’s the full raw from the debug (I’ll post another screen shot of the whole thing if needed)

stage: done
run:
  pipeline: 01jz0rxej3dt5gz3s9y2ztcnsg
  language: en
  conversation_id: 01JZ3S4EMKXE8AGGVAPM2V1ENJ
  runner_data:
    stt_binary_handler_id: null
    timeout: 300
events:
  - type: run-start
    data:
      pipeline: 01jz0rxej3dt5gz3s9y2ztcnsg
      language: en
      conversation_id: 01JZ3S4EMKXE8AGGVAPM2V1ENJ
      runner_data:
        stt_binary_handler_id: null
        timeout: 300
    timestamp: "2025-07-01T19:58:58.708268+00:00"
  - type: intent-start
    data:
      engine: conversation.llama3_2
      language: en-US
      intent_input: Play The Wall
      conversation_id: 01JZ3S4EMKXE8AGGVAPM2V1ENJ
      device_id: null
      prefer_local_intents: false
    timestamp: "2025-07-01T19:58:58.708356+00:00"
  - type: intent-progress
    data:
      chat_log_delta:
        role: assistant
        content: >-
          {"name": "_1751333289473", "parameters": {"media_type": "track",
          "artist": "", "album": "The Wall", "media_id": "The Wall",
          "radio_mode": false, "area": [], "shuffle": true}}
    timestamp: "2025-07-01T19:59:21.345908+00:00"
  - type: intent-progress
    data:
      chat_log_delta:
        content: ""
    timestamp: "2025-07-01T19:59:21.380023+00:00"
  - type: intent-end
    data:
      processed_locally: false
      intent_output:
        response:
          speech:
            plain:
              speech: >-
                {"name": "_1751333289473", "parameters": {"media_type": "track",
                "artist": "", "album": "The Wall", "media_id": "The Wall",
                "radio_mode": false, "area": [], "shuffle": true}}
              extra_data: null
          card: {}
          language: en-US
          response_type: action_done
          data:
            targets: []
            success: []
            failed: []
        conversation_id: 01JZ3S4EMKXE8AGGVAPM2V1ENJ
        continue_conversation: false
    timestamp: "2025-07-01T19:59:21.380739+00:00"
  - type: run-end
    data: null
    timestamp: "2025-07-01T19:59:21.380791+00:00"
intent:
  engine: conversation.llama3_2
  language: en-US

Ok so… Frankly. You need to do a LOT more.

Go read some of this. At least the first post (im at 115, so start at 1.)
https://community.home-assistant.io/t/fridays-party-creating-a-private-agentic-ai-using-voice-assistant-tools

Short version LLMs need. A LOT of context

Friday’s prompt is literally 8 typewritten pages

Right. So I’m going to start digging through that thread and see if I can follow all the bits going on. What I’m currently getting is I need to expand the “Instructions” section in the conversation, correct?

I’m mostly confirming because none of the setup docs have mentioned expanding this BUT I’ve only really found the beginning steps for most of them (connecting all the services) and nothing (other than your post - which I will admit when I found before looked to be more along the lines of ‘here’s how to make it more awesome’ and not ‘here’s how to implement functionality’) which covers how to build these prompts out. I’ll do some diving into this and see what I can work out to add.

(Unless you have suggestions one where to start? - I suppose I had assumed that because HA exposed my devices to Assist, it passes info to the LLM about them. I think I could build out the context instructions but I don’t know what’s already being provided.)

1 Like

The short version is, when an LLM starts. It knows exactly what you told it and wants to give you an answer.

If it doesn’t immediately have the context to solve the ask… It makes crap up.

So. You have to tell it something to the effect of…

Do not lie, if you do not know the answer say you do not know. And here is where everything you need to know is and where to get it.

Right. And I think I had assumed that HA was passing more information to the LLM in the background than it is. (I thought “Expose” meant it was telling the LLM about all this and not just allowing it to control them.)

So I should assume HA is passing nothing but those lines in the instructions and the query to the LLM and expand those instructions with things like what’s available to it and how I would like certain prompts responded to.

For example, I should provide my address in there in a line like “For queries where a location is needed, assume {{HOME ADDRESS}} unless otherwise specified.”

That sort of thing?

It does send info.

But like grandmas box it has all your stuff and no context about what the stuff is. Without telling the llm what it’s looking at all it knows is that’s a thing of domain light or domain sensor. Beyond that…

I also go through what is exposed in tools and what is not. You CAN make a self documenting context engine (at its core that’s what Friday really is) if you’re over the top like me but if not you still need to be prepared to fill in the context.

Let me put it this way. If you do t know it probably doesn’t either. You need to be prepared to deliver ALL the context and assume none

Tool preference, behaviors, tone, what a thing IS… All you.

(the home address will be provided to openai during search if you clicked the checkbox and setup zone.home. But if all that isn’t true search doesn’t know where you are… That’s you too. :sunglasses:)

So I’ve gotten up to here: Friday's Party: Creating a Private, Agentic AI using Voice Assistant tools - #7 by NathanCu

In your example prompt, you have stuff like this:

System Prompt:
{{state_attr('sensor.variables', 'variables')["Friday's Purpose"] }}

I want to make sure I’m understanding where that goes and where it’s pulling from. I’m assuming this goes into the “Instructions” box for the conversation engine. And because I’m rusty on my HA variables - is this pulling from Helpers for the Purpose/etc or are those being stored somewhere else?
(I think I read it earlier in the thread, but the kids are home from school and I’ve been interrupted a lot while reading.)

Edit: Wait no - it’s from here: Trigger based template sensor to store global variables

1 Like

Yep you found it and if you wait till this Friday I’m posting all the code Im using to manipulate them. :sunglasses:. (im at a conference and have a day job else it’d already done.)

This trigger text sensors are pretty important. You’ll use that or text in a jinja2 macro as text storage

Okay, so I have a modified Purpose/Directives built. I’m still trying to figure out the index aspect there and what needs built for that. (I’ll go through those posts again later but if there’s a TL/DR version of how to build the list, that’d be great.)
It’s still dumping the markup item when I ask it to play, but I haven’t touched that aspect. (That said, it is reading weather information mostly correctly and sunset/sunrise info.)

Also, I’m occasionally getting a preface bit of markup, but it looks like something isn’t closing a " correctly or something hasn’t been sanitized when passed.

Do everything manually until you replace parts by code

I started just by writing paragraphs then replacing chunks at a time with templates.

I think I’ve gotten through a bit more. I’m still working through constructing the command library - not sure what that should look like or include. (I think I see yours in your example, but that one throws some formatting errors, even edited.) I’ll keep digging (though I may have to put this down through the weekend as we’re at a convention).

Thanks for all the help so far.

1 Like

Skip. To the last post on Fridays party. Good guide for what kinds of things to include.

At that point you’ll get where it’s oh more this less that.

If. Suddenly your AI forgets how to do basic crap you’ve overloaded the context window. Back off how.much is loaded up. Use the alt alias texts heavily of. You’re just trying to describe something to the ai.

Some general advice for working with ollama here as well, if you are seeing strangeness ensure you take a look at the ollama logs for any indication that the context limit was reached and the input has been truncated.

Ollama will give a warning on every request which gets truncated like that, and if your context is truncating you will absolutely have issues.

1 Like

Skipping there -

Okay, I think I am following what’s going. I’m working on labeling and things (though this is still a barebones HA setup for the moment).

For Context - there’s a lot of code there, but if I’m not worried about context size at the moment, can I just work this out in natural language into the prompt and then work on making it more concise?
(My coding background is very out of date and I’m working on bringing it up to date but…)

(And I checked the Ollama logs - No errors about being over context limits yet.)

Oh - so end goal of this is to look to moving away from the Alexa setup we have currently and move to something more private/controllable. We’ve had growing issues with Alexa (and Google Home before it) where it gets weirder and weirder without real cause. We had a decent setup on HA before, but have rebuilt a couple times (currently on Homey, which that aspect works well, but the voice aspects are going weird.)
For starters what we want is:
a) Control smart devices
b) Play music (and possibly other media, but music is the big one)
c) interact with a calendar to trigger certain actions based on calendar events.

I have setup HA to do a and c before, and Assist seems to be able to do a out of the box. I can’t get b to work. (I’m not sure I need the voice assistant to do much with c, to start, but wouldn’t mind it being able to do some of that sort of thing.)

1 Like

For B look at Music Assistant. I basically treat it like mealie and grocy and tell the AI hey there’s the MA tools use them. Then point at the MA scripts. That will give you a music library and a tool for your AI to drive.

My stuff gives C. The things a personal assistant will need. (ToDo lists calendars notes alminac etc…)

I think you’re picking up the core… :sunglasses:

Yeah, the current bit is to get it to actually start something with Music Assistant, which it knows about, but just dumps the markup object rather than doing it.

I think I have it reading weather data and it seems aware of the Music Assistant Addon now (when I ask it, it tells me it can use Music Assistant to play music).

1 Like