Hi everyone, I am currently trying to use the ollama integration with llama3.2, with voice assistants, but it always answers in a weird format with name, parameters etc.
The model is simply too dumb. If you’re using the llama3.2 tag, then it’s a 3b model that’s been quantized to less than 1/3 its (already very small) size.
Ok, thanks for the answers. Suspected that the “lightness” of the model was the problem, but wanted to rule out potential errors in how I configured it.
If you have the vram specify the :7b (or 8 I can’t remember which for that one) and it’ll instantly get better. I’ve used a llama 3.2 for some testing on my Intel-IPEX-ARC rig and it does work.
It’s I had a 7 or 8b quantized in my list that ran ok. I haven’t spun it up in a few weeks I’ll look when I get time to play later tonight I’ve been deep in the AI side of things for a couple weeks.
Ah my mistake - it was 3.1 that comes in an 8b variant.
works if not a bit clunky. But fits in the IPEX build. I think I have ~16-20 G to play with on this box…
I also have a Mistral 7b and a few others. But the other thing - look up docs on how to increase the context window or use a model that supports a large one. (Pretty much anything ChatGPT is 128k for comparison) If you find you have limitations on numbers of entitles or they just suddenly ‘forget’ stuff they should know - 99% this is what’s happening. I was actually about to start looking into the extended context versions of Phi3 phi3:medium-128k and Mistral yarn-mistral:7b-128k. The stuff Im doing with Friday is basically a bomb for those smaller context windows and it mows through them like spring grass.
MS’s Phi 3 models are supposed to be VERY good with Home automation tasks, can run tools, but that model may take more memory than I have available - it’s hard to tell on the IPEX.
As an update, I think it may had to do with the ha ollama integration. I have now tried qwen2.5 7b with the local llm ollama integration from hacs, and for the most part it works as intended.
I have currently tried llama3.1 and qwen2.5 both 7b. The results seem to be kind of the same, but with qwen I had more success in terms of “hallucinating”. Llama started to see devices that didn’t exist.
Fixt is good and utilitarian if yih want to turn a few things on/off or make a good decision on small data (here’s a bunch of crap summarize it) has a bad tendency to make stuff up S fact if it doesn’t have source data and explicit instructions where to get it. (you stupid model there’s the BBC rss feed right there use it… Im sorry I don’t have access to…)
Llama is better. Way better. Way heavier.
Was looking at a qwen here
Phi et Al. For number crunches and dB result digs.
Mixtral is GOOD but even quantized down to 4b it BARELY fits in my nuc AI (32g flat memory model, you need 26.5 for it to even load)