Custom Integration: Ollama Conversation (Local AI Agent)

I have ollama running here, but what can I do with the HA integration? Only use the text interface which I can anyway use with ollama directly?

Is there a way to set up sentences that trigger automations and if no automation is registered then it sends the prompt to the ollama AI??

Iā€™m in beta and for HA next release, you can now use Ollama for Local LLM control of HA, but iā€™m having a hard time getting it to work.

Anyone sucessful?

I conducted a test and it was successful. But there will be hallucinations, I use llma3.1:8b. Perhaps due to my use of Chinese, the success rate may be lower, as the more entities exposed, the slower the speed. When I use the qwen2 model and the old integration of ollama (which only supports queries), the success rate is high and the speed is also fast. The official ollama integration of HA prompted me that the qwen2 model does not support tool calls. So a suitable LLM model and accurate prompts will have a very good effect. But I didnā€™t conduct any testing

Did you figure it out? I am trying to use llama3.1 8b and set it up properly. But when asking it to turn on a light I get an error that tools are not supported

ā€œSorry, I had a problem talking to the Ollama server: llama3.1:latest does not support toolsā€

Unfortunately I donā€™t have that problem. I have the problem where you ask it a question and answer something completely different it doesnā€™t even know how to use any services.

Also for me the new control feature with ollama (llama3) is not working. A single unavailable device prevents the system to do anything even if its a totally different device you want to control. Also the language settings does not work. I configured it to use german, but it always answers in english.

If i remove all unavailable entities from assist, he tries to control a power switch instead of turning on the lights, what the request was.

Theres a PR to have context window configurable, which may improve the process:

Iā€™m pretty new to running LLMā€™s. I currently have Home Assistant running on an Intel Core i5-10210U (10th gen, 4 Core, 8 Threads) and 16GB of RAM with integrated graphics. I was thinking of co-hosting a model with Ollama on this machine but Iā€™m worried that might be a bit ambitious. I could upgrade to 32GB of RAM, but what I have works for what I use it for right now and if that bump still wouldnā€™t be enough to do anything reasonable then I would rather save up and getting something more appropriate.

My expectations are pretty low. Iā€™d be happy with anything that is better than the non-LLM voice assistants and could handle some switch toggling and maybe setting a timer. Any thoughts on whatā€™s possible with this setup?

youā€™ll be limited to models that are less than 14 GB (if leaving 2 GB for your system). Thankfully, the llama3.1 q version range from 4 to 8.5GB (q8) so you should be able to run that model.

How fast? that depends, the first load will be the slowest because it has to transfer the model to the RAM from the harddrive. Hopefully you have an SDD. Once its in the ram, it will be as fast as your ram and bus speed etc.

I personally would not upgrade the memory bur rather spend it on the a GPU. Even if you have a small GPU with 2 or 4 GB VRAM, it will dump some of the model on the GPU and use both CPU/ram and GPU, and will improve speed.

The goal is to get the entrie model to fit in your VRAM, and then the answers will be much faster (or almost instant depending on your GPU).

I gave it a try just to see what happens. I tried a few different small models including llama3 and mistral. Just chatting with ollama by itself, it was kind of slow but not too bad. Then I tried it in Home Assistant. With no ability to assist, it was pretty painfully slow but responded to queries. When I enabled Assist, it didnā€™t return at all. I eventually lowered the amount of exposed entities to 23 and it still couldnā€™t handle it.

Iā€™m running a mini PC that has USB 3.1 ports but does not seem to have thunderbolt on any of them, so adding a GPU might be out of the question. I saw that ollama has some open issues to try and utilize built-in Intel GPUā€™s which might help a little. I might revisit this if they ever stabilize that support and see if it works better.

What are people doing who are using Piā€™s or the Home Assistant Greenā€™s, etc? Off-loading to Open AI? Or running a dedicated AI server?

for 2024.9, you can now change the context size for the models. For example, I increased it to 15K (8K default, and it was 2K pre-2024.9) and what a difference. Now it works really well now.

I also increased the context size and can now expose more entities which is great. Also some good progress made towards using local LLM to replace the cloud options by using a model called ā€œllama3-groq-tool-useā€ instead of the recommended llama3.1 model. The results are impressive and very close to Google AI. I strongly recommend this model.
Now I just need a good hardware for the ultimate voice assistant. Waiting for the Nabu Casa voice assistant.

how well does the ā€œlama3-groq-tool-useā€ do for non-function call queries, like just general chat?

Recommended approach

Model Selection : Based on the query analysis, route the request to the most appropriate model:

  • For queries involving function calling, API interactions, or structured data manipulation, use the Llama 3 Groq Tool Use models.
  • For general knowledge, open-ended conversations, or tasks not specifically related to tool use, route to a general-purpose language model like unmodified the Llama-3 70B.

via Introducing Llama-3-Groq-Tool-Use Models - Groq is Fast AI Inference

How does llama3-groq-tool-use compare to allenporter/assist-llm?

Is there any way to make it switch model depending on the type of sentence? Maybe using some RAG pipelines?