I wanted to share the custom component I’ve been working on for running local AI models as conversation agents. It is all done locally via the Llama.cpp library which allows for running models on your CPU and even runs on a Raspberry Pi (if you can find a small enough model that is). The goal was to have a solution that allowed me to experiment with different local AI models that can be quantized to run on actual Home Assistant hardware in the near future. As the project has progressed, there is now support for remote backends and running much larger models on a GPU.
Features:
- Run models locally using Llama.cpp as part of Home Assistant or by connecting to Ollama, llama-cpp-python server, or text-generation-webui
- Output parsing to execute Home Assistant services using JSON function calling
- A provided example model that is fine tuned to work with the extension
- Supports models fine-tuned on the provided dataset as well as non-fine-tuned models via In-Context-Learning examples
Out of the box support for:
- Llama 3
- Mistral/Mixtral Instruct
- Command R
The dataset is also provided if others want to fine tune their own models to work with this extension.
Installation instructions are on the GitHub page and it should be installable via HACS.