Local LLM Conversation Integration

I wanted to share the custom component I’ve been working on for running local AI models as conversation agents. It is all done locally via the Llama.cpp library which allows for running models on your CPU and even runs on a Raspberry Pi (if you can find a small enough model that is). The goal was to have a solution that allowed me to experiment with different local AI models that can be quantized to run on actual Home Assistant hardware in the near future. As the project has progressed, there is now support for remote backends and running much larger models on a GPU.

Features:

  • Run models locally using Llama.cpp as part of Home Assistant or by connecting to Ollama, llama-cpp-python server, or text-generation-webui
  • Output parsing to execute Home Assistant services using JSON function calling
  • A provided example model that is fine tuned to work with the extension
  • Supports models fine-tuned on the provided dataset as well as non-fine-tuned models via In-Context-Learning examples

Out of the box support for:

  • Llama 3
  • Mistral/Mixtral Instruct
  • Command R

The dataset is also provided if others want to fine tune their own models to work with this extension.

Installation instructions are on the GitHub page and it should be installable via HACS.

9 Likes

Where are the downloaded models stored please ? Apparently those are not automatically removed after removing the integration.

The default storage location is /media/models/ if it was downloaded from HuggingFace using the integration.

1 Like

Hello, first of all thanks for all the work put up in this integration.
Second, I just installed the integration, I have Ollama running on a windows computer with 12Gb VRAM Nvidia GPU. I can connect to the server, and it talks back using Assist in HA. However when asked to interact with my devices it says that the device has been turned on or off but nothing happens as fiscally the device does not change its state

Is there a way to trigger the ollama if the predefined sentences dont get triggered? Like if i have already predefined sentences for automation, if they dont get triggered then it falls back on to ollama to find a solution?