Ollama/Local LLM configuration

adamoutler · November 26, 2024, 1:14pm

I’ve tried systematically changing each configuration option on multiple LLMs with multiple versions of Ollama and LocalLLM configurations and despite claims from multiple integrations I’ve yet to see any proof Local LLMs work with HA.

What configuration actually works? Where are the LLM expected outputs documented?

solstyce9 · November 26, 2024, 1:26pm

Following since I’m getting started on this myself.

Have you verified your local LLM is responding to queries in a timely manner? e.g. if your system is underpowered for the size of the LLM you’re running, it won’t respond or will take many many minutes to respond.

adamoutler · November 26, 2024, 9:55pm

Yeah. Ollama has an RTX3090. It’s fine. Takes about 5s total for most responses.

This is the best I can get really.

williamhart-az · December 10, 2024, 4:55am

I can get replies too, just it hallucinates the reply. I’m using ollama with llama 3.1. The issue is I have over 100 entities. There is a recommendation for less than 25. What this looks like it needs a new architecture, for example a RAG.

As I understand it, the states are passed as a system prompt to the llm and that will overflow the token count. Doesn’t seem to be well optimized.