Custom Integration: Ollama Conversation (Local AI Agent)

jms3000 · June 22, 2024, 7:24am

I have ollama running here, but what can I do with the HA integration? Only use the text interface which I can anyway use with ollama directly?

kuligs2 · August 5, 2024, 2:23pm

Is there a way to set up sentences that trigger automations and if no automation is registered then it sends the prompt to the ollama AI??

Anto79-ops · August 6, 2024, 6:15pm

I’m in beta and for HA next release, you can now use Ollama for Local LLM control of HA, but i’m having a hard time getting it to work.

Anyone sucessful?

NIUB · August 7, 2024, 4:05pm

I conducted a test and it was successful. But there will be hallucinations, I use llma3.1:8b. Perhaps due to my use of Chinese, the success rate may be lower, as the more entities exposed, the slower the speed. When I use the qwen2 model and the old integration of ollama (which only supports queries), the success rate is high and the speed is also fast. The official ollama integration of HA prompted me that the qwen2 model does not support tool calls. So a suitable LLM model and accurate prompts will have a very good effect. But I didn’t conduct any testing

markmghali · August 7, 2024, 7:51pm

Did you figure it out? I am trying to use llama3.1 8b and set it up properly. But when asking it to turn on a light I get an error that tools are not supported

“Sorry, I had a problem talking to the Ollama server: llama3.1:latest does not support tools”

Anto79-ops · August 7, 2024, 8:03pm

Unfortunately I don’t have that problem. I have the problem where you ask it a question and answer something completely different it doesn’t even know how to use any services.

lweberru · August 9, 2024, 7:16am

Also for me the new control feature with ollama (llama3) is not working. A single unavailable device prevents the system to do anything even if its a totally different device you want to control. Also the language settings does not work. I configured it to use german, but it always answers in english.

lweberru · August 9, 2024, 7:19am

If i remove all unavailable entities from assist, he tries to control a power switch instead of turning on the lights, what the request was.

Anto79-ops · August 10, 2024, 9:47pm

Theres a PR to have context window configurable, which may improve the process:

github.com/home-assistant/core

Latest HA 2024.8.0 with latest Ollama and pretty much every model is unusable

opened 01:08AM - 08 Aug 24 UTC

Rudd-O

### The problem Whenever I ask the model to give me the state of any of the e…ntities listed in the preamble sent to Ollama, the response from the LLM is basically trash. > what is the state of the guest bathroom lights? > {"name": "HassLightGet", "parameters": {"device_class": "["}"} Please note that I've assumed a function named "HassLightGet" exists to get the state of light devices. If it's not present in your original list, please let me know and I can try again or we can create a new function call. This is actually worse than the previous version. In the previous version, the model would at least recognize some of the entities in the listing sent within the preamble. Now the model just hallucinates entities that don't exist, and refuses to identify / describe / control them. I'm using llama3.1:8b and also mistral-nemo:latest. Same result with both models. I've verified with `tcpdump` that the calls are correct and the responses are flowing back to HA. In every way this is a massive regression. **I also can't alter the entities part of the prompt by configuration either.** The default prompt is basically a paragraph, and all the templated entity stuff that used to be in the configurable prompt is gone -- it is replaced by something which HA is adding on its own without any configurable (templated) content. (Plus, of course, the tools JSON.) I can see it all in the logs and in `tcpdump` output. ### What version of Home Assistant Core has the issue? core-2024.8.0 ### What was the last working version of Home Assistant Core? _No response_ ### What type of installation are you running? Home Assistant Core ### Integration causing the issue ollama ### Link to integration documentation on our website _No response_ ### Diagnostics information [debuglog.txt](https://github.com/user-attachments/files/16536344/debuglog.txt) ### Example YAML snippet _No response_ ### Anything in the logs that might be useful for us? _No response_ ### Additional information _No response_

jwilkicki · August 12, 2024, 9:23pm

I’m pretty new to running LLM’s. I currently have Home Assistant running on an Intel Core i5-10210U (10th gen, 4 Core, 8 Threads) and 16GB of RAM with integrated graphics. I was thinking of co-hosting a model with Ollama on this machine but I’m worried that might be a bit ambitious. I could upgrade to 32GB of RAM, but what I have works for what I use it for right now and if that bump still wouldn’t be enough to do anything reasonable then I would rather save up and getting something more appropriate.

My expectations are pretty low. I’d be happy with anything that is better than the non-LLM voice assistants and could handle some switch toggling and maybe setting a timer. Any thoughts on what’s possible with this setup?

Anto79-ops · August 13, 2024, 6:24pm

you’ll be limited to models that are less than 14 GB (if leaving 2 GB for your system). Thankfully, the llama3.1 q version range from 4 to 8.5GB (q8) so you should be able to run that model.

How fast? that depends, the first load will be the slowest because it has to transfer the model to the RAM from the harddrive. Hopefully you have an SDD. Once its in the ram, it will be as fast as your ram and bus speed etc.

I personally would not upgrade the memory bur rather spend it on the a GPU. Even if you have a small GPU with 2 or 4 GB VRAM, it will dump some of the model on the GPU and use both CPU/ram and GPU, and will improve speed.

The goal is to get the entrie model to fit in your VRAM, and then the answers will be much faster (or almost instant depending on your GPU).

jwilkicki · August 16, 2024, 7:32pm

I gave it a try just to see what happens. I tried a few different small models including llama3 and mistral. Just chatting with ollama by itself, it was kind of slow but not too bad. Then I tried it in Home Assistant. With no ability to assist, it was pretty painfully slow but responded to queries. When I enabled Assist, it didn’t return at all. I eventually lowered the amount of exposed entities to 23 and it still couldn’t handle it.

I’m running a mini PC that has USB 3.1 ports but does not seem to have thunderbolt on any of them, so adding a GPU might be out of the question. I saw that ollama has some open issues to try and utilize built-in Intel GPU’s which might help a little. I might revisit this if they ever stabilize that support and see if it works better.

What are people doing who are using Pi’s or the Home Assistant Green’s, etc? Off-loading to Open AI? Or running a dedicated AI server?

Anto79-ops · August 29, 2024, 8:25pm

for 2024.9, you can now change the context size for the models. For example, I increased it to 15K (8K default, and it was 2K pre-2024.9) and what a difference. Now it works really well now.

haidar23 · September 6, 2024, 9:22am

I also increased the context size and can now expose more entities which is great. Also some good progress made towards using local LLM to replace the cloud options by using a model called “llama3-groq-tool-use” instead of the recommended llama3.1 model. The results are impressive and very close to Google AI. I strongly recommend this model.
Now I just need a good hardware for the ultimate voice assistant. Waiting for the Nabu Casa voice assistant.

Anto79-ops · September 12, 2024, 5:55pm

how well does the “lama3-groq-tool-use” do for non-function call queries, like just general chat?

python · September 14, 2024, 2:10am

Recommended approach

Model Selection : Based on the query analysis, route the request to the most appropriate model:

For queries involving function calling, API interactions, or structured data manipulation, use the Llama 3 Groq Tool Use models.
For general knowledge, open-ended conversations, or tasks not specifically related to tool use, route to a general-purpose language model like unmodified the Llama-3 70B.

via Introducing Llama-3-Groq-Tool-Use Models - Groq is Fast AI Inference

edurenye · September 18, 2024, 4:14pm

How does llama3-groq-tool-use compare to allenporter/assist-llm?

Is there any way to make it switch model depending on the type of sentence? Maybe using some RAG pipelines?