I have ollama running here, but what can I do with the HA integration? Only use the text interface which I can anyway use with ollama directly?
Is there a way to set up sentences that trigger automations and if no automation is registered then it sends the prompt to the ollama AI??
Iām in beta and for HA next release, you can now use Ollama for Local LLM control of HA, but iām having a hard time getting it to work.
Anyone sucessful?
I conducted a test and it was successful. But there will be hallucinations, I use llma3.1:8b. Perhaps due to my use of Chinese, the success rate may be lower, as the more entities exposed, the slower the speed. When I use the qwen2 model and the old integration of ollama (which only supports queries), the success rate is high and the speed is also fast. The official ollama integration of HA prompted me that the qwen2 model does not support tool calls. So a suitable LLM model and accurate prompts will have a very good effect. But I didnāt conduct any testing
Did you figure it out? I am trying to use llama3.1 8b and set it up properly. But when asking it to turn on a light I get an error that tools are not supported
āSorry, I had a problem talking to the Ollama server: llama3.1:latest does not support toolsā
Unfortunately I donāt have that problem. I have the problem where you ask it a question and answer something completely different it doesnāt even know how to use any services.
Also for me the new control feature with ollama (llama3) is not working. A single unavailable device prevents the system to do anything even if its a totally different device you want to control. Also the language settings does not work. I configured it to use german, but it always answers in english.
If i remove all unavailable entities from assist, he tries to control a power switch instead of turning on the lights, what the request was.
Theres a PR to have context window configurable, which may improve the process:
Iām pretty new to running LLMās. I currently have Home Assistant running on an Intel Core i5-10210U (10th gen, 4 Core, 8 Threads) and 16GB of RAM with integrated graphics. I was thinking of co-hosting a model with Ollama on this machine but Iām worried that might be a bit ambitious. I could upgrade to 32GB of RAM, but what I have works for what I use it for right now and if that bump still wouldnāt be enough to do anything reasonable then I would rather save up and getting something more appropriate.
My expectations are pretty low. Iād be happy with anything that is better than the non-LLM voice assistants and could handle some switch toggling and maybe setting a timer. Any thoughts on whatās possible with this setup?
youāll be limited to models that are less than 14 GB (if leaving 2 GB for your system). Thankfully, the llama3.1 q version range from 4 to 8.5GB (q8) so you should be able to run that model.
How fast? that depends, the first load will be the slowest because it has to transfer the model to the RAM from the harddrive. Hopefully you have an SDD. Once its in the ram, it will be as fast as your ram and bus speed etc.
I personally would not upgrade the memory bur rather spend it on the a GPU. Even if you have a small GPU with 2 or 4 GB VRAM, it will dump some of the model on the GPU and use both CPU/ram and GPU, and will improve speed.
The goal is to get the entrie model to fit in your VRAM, and then the answers will be much faster (or almost instant depending on your GPU).
I gave it a try just to see what happens. I tried a few different small models including llama3 and mistral. Just chatting with ollama by itself, it was kind of slow but not too bad. Then I tried it in Home Assistant. With no ability to assist, it was pretty painfully slow but responded to queries. When I enabled Assist, it didnāt return at all. I eventually lowered the amount of exposed entities to 23 and it still couldnāt handle it.
Iām running a mini PC that has USB 3.1 ports but does not seem to have thunderbolt on any of them, so adding a GPU might be out of the question. I saw that ollama has some open issues to try and utilize built-in Intel GPUās which might help a little. I might revisit this if they ever stabilize that support and see if it works better.
What are people doing who are using Piās or the Home Assistant Greenās, etc? Off-loading to Open AI? Or running a dedicated AI server?
for 2024.9, you can now change the context size for the models. For example, I increased it to 15K (8K default, and it was 2K pre-2024.9) and what a difference. Now it works really well now.
I also increased the context size and can now expose more entities which is great. Also some good progress made towards using local LLM to replace the cloud options by using a model called āllama3-groq-tool-useā instead of the recommended llama3.1 model. The results are impressive and very close to Google AI. I strongly recommend this model.
Now I just need a good hardware for the ultimate voice assistant. Waiting for the Nabu Casa voice assistant.
how well does the ālama3-groq-tool-useā do for non-function call queries, like just general chat?
Recommended approach
Model Selection : Based on the query analysis, route the request to the most appropriate model:
- For queries involving function calling, API interactions, or structured data manipulation, use the Llama 3 Groq Tool Use models.
- For general knowledge, open-ended conversations, or tasks not specifically related to tool use, route to a general-purpose language model like unmodified the Llama-3 70B.
via Introducing Llama-3-Groq-Tool-Use Models - Groq is Fast AI Inference
How does llama3-groq-tool-use compare to allenporter/assist-llm?
Is there any way to make it switch model depending on the type of sentence? Maybe using some RAG pipelines?