I’ve been trying to setup Home Assistant Voice using a locally hosted LLM on an Ollama server I’ve setup. I have Assist talking to the LLM and it will return answers to queries. It will also turn on and off devices on my Home Assistant. I’ve tried to get it to play music and running into some weirdness in the responses. (I’ve managed to have it play once, but that was unrecreatable afterwards with no changes.)
I’ll ask Assist to “Play The Wall” and what I get most often back is this:
{“name”: “Script to Play Music”, “parameters”: {“media_type”: “album”, “artist”: “”, “album”: “The Wall”, “media_id”: “”, “media_description”: “”, “shuffle”: true, “area”: “Living Room”}}
The “name” element changes occasionally. Sometimes it will be “HassMediaSearchAndPlay” sometimes it’s a number. Lately it is “Script to Play Music” after I added the context that the LLM should use the script I have by that name to launch music. (Which came from Option 3 here: GitHub - music-assistant/voice-support: Music Assistant blueprints )
Assist’s returned text looks like it should be something passed to an action/intent/tool (sorry, not sure correct nomenclature here) but instead is read aloud as if that’s what I want it to do.
I’m using Home Assistant Cloud for Speech-to-Text and Text-to-Speech. I do have Whisper and Piper installed as well and could swap those, but previously they were not as successful understanding my speech as Home Assistant Cloud. Using Home Assistant Cloud as the conversation processor leads to a bunch of “I don’t understand” response, but I can dig those up as well if that will help here.
Where do I need to poke around? What other info would be helpful in sorting this out?
I’m using Llama 3.2. I’ve also tried Mistral and qwen3. Llama was the only one that succeeded (once) and then stopped working and started doing this again.
I’ve tried adding to the default context that the start but it’s not seemed to do much (though telling it to use my script to play music did change the name returned from HassMediaSearchAndPlay to the name of my script.
Is it possible it’s holding on to corrupt context and I need to reset the model. (Either through Ollama or some how in HA?)
Right. So I’m going to start digging through that thread and see if I can follow all the bits going on. What I’m currently getting is I need to expand the “Instructions” section in the conversation, correct?
I’m mostly confirming because none of the setup docs have mentioned expanding this BUT I’ve only really found the beginning steps for most of them (connecting all the services) and nothing (other than your post - which I will admit when I found before looked to be more along the lines of ‘here’s how to make it more awesome’ and not ‘here’s how to implement functionality’) which covers how to build these prompts out. I’ll do some diving into this and see what I can work out to add.
(Unless you have suggestions one where to start? - I suppose I had assumed that because HA exposed my devices to Assist, it passes info to the LLM about them. I think I could build out the context instructions but I don’t know what’s already being provided.)
Right. And I think I had assumed that HA was passing more information to the LLM in the background than it is. (I thought “Expose” meant it was telling the LLM about all this and not just allowing it to control them.)
So I should assume HA is passing nothing but those lines in the instructions and the query to the LLM and expand those instructions with things like what’s available to it and how I would like certain prompts responded to.
For example, I should provide my address in there in a line like “For queries where a location is needed, assume {{HOME ADDRESS}} unless otherwise specified.”
But like grandmas box it has all your stuff and no context about what the stuff is. Without telling the llm what it’s looking at all it knows is that’s a thing of domain light or domain sensor. Beyond that…
I also go through what is exposed in tools and what is not. You CAN make a self documenting context engine (at its core that’s what Friday really is) if you’re over the top like me but if not you still need to be prepared to fill in the context.
Let me put it this way. If you do t know it probably doesn’t either. You need to be prepared to deliver ALL the context and assume none
Tool preference, behaviors, tone, what a thing IS… All you.
(the home address will be provided to openai during search if you clicked the checkbox and setup zone.home. But if all that isn’t true search doesn’t know where you are… That’s you too. )
System Prompt:
{{state_attr('sensor.variables', 'variables')["Friday's Purpose"] }}
I want to make sure I’m understanding where that goes and where it’s pulling from. I’m assuming this goes into the “Instructions” box for the conversation engine. And because I’m rusty on my HA variables - is this pulling from Helpers for the Purpose/etc or are those being stored somewhere else?
(I think I read it earlier in the thread, but the kids are home from school and I’ve been interrupted a lot while reading.)
Yep you found it and if you wait till this Friday I’m posting all the code Im using to manipulate them. . (im at a conference and have a day job else it’d already done.)
This trigger text sensors are pretty important. You’ll use that or text in a jinja2 macro as text storage
Okay, so I have a modified Purpose/Directives built. I’m still trying to figure out the index aspect there and what needs built for that. (I’ll go through those posts again later but if there’s a TL/DR version of how to build the list, that’d be great.)
It’s still dumping the markup item when I ask it to play, but I haven’t touched that aspect. (That said, it is reading weather information mostly correctly and sunset/sunrise info.)
Also, I’m occasionally getting a preface bit of markup, but it looks like something isn’t closing a " correctly or something hasn’t been sanitized when passed.
I think I’ve gotten through a bit more. I’m still working through constructing the command library - not sure what that should look like or include. (I think I see yours in your example, but that one throws some formatting errors, even edited.) I’ll keep digging (though I may have to put this down through the weekend as we’re at a convention).
Skip. To the last post on Fridays party. Good guide for what kinds of things to include.
At that point you’ll get where it’s oh more this less that.
If. Suddenly your AI forgets how to do basic crap you’ve overloaded the context window. Back off how.much is loaded up. Use the alt alias texts heavily of. You’re just trying to describe something to the ai.
Some general advice for working with ollama here as well, if you are seeing strangeness ensure you take a look at the ollama logs for any indication that the context limit was reached and the input has been truncated.
Ollama will give a warning on every request which gets truncated like that, and if your context is truncating you will absolutely have issues.
Okay, I think I am following what’s going. I’m working on labeling and things (though this is still a barebones HA setup for the moment).
For Context - there’s a lot of code there, but if I’m not worried about context size at the moment, can I just work this out in natural language into the prompt and then work on making it more concise?
(My coding background is very out of date and I’m working on bringing it up to date but…)
(And I checked the Ollama logs - No errors about being over context limits yet.)
Oh - so end goal of this is to look to moving away from the Alexa setup we have currently and move to something more private/controllable. We’ve had growing issues with Alexa (and Google Home before it) where it gets weirder and weirder without real cause. We had a decent setup on HA before, but have rebuilt a couple times (currently on Homey, which that aspect works well, but the voice aspects are going weird.)
For starters what we want is:
a) Control smart devices
b) Play music (and possibly other media, but music is the big one)
c) interact with a calendar to trigger certain actions based on calendar events.
I have setup HA to do a and c before, and Assist seems to be able to do a out of the box. I can’t get b to work. (I’m not sure I need the voice assistant to do much with c, to start, but wouldn’t mind it being able to do some of that sort of thing.)
For B look at Music Assistant. I basically treat it like mealie and grocy and tell the AI hey there’s the MA tools use them. Then point at the MA scripts. That will give you a music library and a tool for your AI to drive.
My stuff gives C. The things a personal assistant will need. (ToDo lists calendars notes alminac etc…)
Yeah, the current bit is to get it to actually start something with Music Assistant, which it knows about, but just dumps the markup object rather than doing it.
I think I have it reading weather data and it seems aware of the Music Assistant Addon now (when I ask it, it tells me it can use Music Assistant to play music).