Problem with ollama voice assistant integration

,

Hi everyone, I am currently trying to use the ollama integration with llama3.2, with voice assistants, but it always answers in a weird format with name, parameters etc.

I should probably also mention that this only happens when I set ollama to be able to control home assistant.

The model is simply too dumb. If you’re using the llama3.2 tag, then it’s a 3b model that’s been quantized to less than 1/3 its (already very small) size.

1 Like

I’ve found I need a model that’s fully quantized at 7b or better params, built within the last year and supports Tool Calling.

Ok, thanks for the answers. Suspected that the “lightness” of the model was the problem, but wanted to rule out potential errors in how I configured it.

1 Like

If you have the vram specify the :7b (or 8 I can’t remember which for that one) and it’ll instantly get better. I’ve used a llama 3.2 for some testing on my Intel-IPEX-ARC rig and it does work.

1 Like

LLaMA is dumb. It’s just a long SHeEP. Think about it!

1 Like

I don’t think there’s a 7b LLaMA 3.2 listed on ollama.com?

1 Like

It’s I had a 7 or 8b quantized in my list that ran ok. I haven’t spun it up in a few weeks I’ll look when I get time to play later tonight I’ve been deep in the AI side of things for a couple weeks.

What llama3.2 are you using, because in ollama I could not find any with 7 or 8b.

Ah my mistake - it was 3.1 that comes in an 8b variant.
image
works if not a bit clunky. But fits in the IPEX build. I think I have ~16-20 G to play with on this box…

I also have a Mistral 7b and a few others. But the other thing - look up docs on how to increase the context window or use a model that supports a large one. (Pretty much anything ChatGPT is 128k for comparison) If you find you have limitations on numbers of entitles or they just suddenly ‘forget’ stuff they should know - 99% this is what’s happening. I was actually about to start looking into the extended context versions of Phi3 phi3:medium-128k and Mistral yarn-mistral:7b-128k. The stuff Im doing with Friday is basically a bomb for those smaller context windows and it mows through them like spring grass.

MS’s Phi 3 models are supposed to be VERY good with Home automation tasks, can run tools, but that model may take more memory than I have available - it’s hard to tell on the IPEX.

1 Like

Just tried with llama3.1:8b, still the same output with the 3.2.

For mistral it answers kinda differently:

To turn on the kitchen lamp using Home Assistant, you would use the following command:

HassCallService(service="light/kitchen_lamp/turn_on", target_device="kitchen_lamp")

Similar output with qwen2.5 7b

<tool_call> {“name”: “HassTurnOff”, “arguments”: {“domain”: [“light”], “name”: “Kitchen Lamp Light”}} </tool_call>

Also because I forgot to mention before, I run them on a gpu with 12gb vram, but they seem to run just fine when not trying to control home assistant.

What’s your ollama version?

It is version 0.3.12.

As an update, I think it may had to do with the ha ollama integration. I have now tried qwen2.5 7b with the local llm ollama integration from hacs, and for the most part it works as intended.

1 Like

That version is 7 months old, try updating.

1 Like

Just updated and indeed the ollama integration is now working as intended. Thank you all so much.

2 Likes

Which model did you land on?

I have currently tried llama3.1 and qwen2.5 both 7b. The results seem to be kind of the same, but with qwen I had more success in terms of “hallucinating”. Llama started to see devices that didn’t exist.

Qwen is stellar compared to llama3.1.

2 Likes

Which one are you running R?

Fixt is good and utilitarian if yih want to turn a few things on/off or make a good decision on small data (here’s a bunch of crap summarize it) has a bad tendency to make stuff up S fact if it doesn’t have source data and explicit instructions where to get it. (you stupid model there’s the BBC rss feed right there use it… Im sorry I don’t have access to…)

Llama is better. Way better. Way heavier.

Was looking at a qwen here

Phi et Al. For number crunches and dB result digs.

Mixtral is GOOD but even quantized down to 4b it BARELY fits in my nuc AI (32g flat memory model, you need 26.5 for it to even load)