I wanted to share the custom component I’ve been working on for running local AI models as conversation agents. It is all done locally via the Llama.cpp library which allows for running models on your CPU and even runs on a Raspberry Pi (if you can find a small enough model that is). The goal was to have a solution that allowed me to experiment with different local AI models that can be quantized to run on actual Home Assistant hardware in the near future. As the project has progressed, there is now support for remote backends and running much larger models on a GPU.
Features:
Run models locally using Llama.cpp as part of Home Assistant or by connecting to Ollama, llama-cpp-python server, or text-generation-webui
Output parsing to execute Home Assistant services using JSON function calling
A provided example model that is fine tuned to work with the extension
Supports models fine-tuned on the provided dataset as well as non-fine-tuned models via In-Context-Learning examples
Out of the box support for:
Llama 3
Mistral/Mixtral Instruct
Command R
The dataset is also provided if others want to fine tune their own models to work with this extension.
Installation instructions are on the GitHub page and it should be installable via HACS.
Hello, first of all thanks for all the work put up in this integration.
Second, I just installed the integration, I have Ollama running on a windows computer with 12Gb VRAM Nvidia GPU. I can connect to the server, and it talks back using Assist in HA. However when asked to interact with my devices it says that the device has been turned on or off but nothing happens as fiscally the device does not change its state
Is there a way to trigger the ollama if the predefined sentences dont get triggered? Like if i have already predefined sentences for automation, if they dont get triggered then it falls back on to ollama to find a solution?
I’m trying to get this working. I run HA (version 2024.12.5) using the Home Assistant OS on a virtual machine running on my Proxmox Cluster. My HA does not have a GPU and I have given it 6GB of Ram.
I’ve installed the Local LLM integration from HACS.
When adding the integration, I choose Llama.cpp (HuggingFace).
The model I’ve chosen is acon96/Home-3B-GGUF
I can see the model files downloaded in /media/models
When I go to configure the voice assistant, I do not have the local LLM option to select for the conversation agent. The logs indicate that the conversation platform is not launching. Any suggestions on what I should try?
Logger: homeassistant.components.conversation
Source: helpers/entity_platform.py:366
integration: Conversation (documentation, issues)
First occurred: 6:48:31 PM (2 occurrences)
Last logged: 6:51:44 PM
Error while setting up llama_conversation platform for conversation
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 366, in _async_setup_platform
await asyncio.shield(awaitable)
File "/config/custom_components/llama_conversation/conversation.py", line 179, in async_setup_entry
await agent._async_load_model(entry)
File "/config/custom_components/llama_conversation/conversation.py", line 282, in _async_load_model
return await self.hass.async_add_executor_job(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
self._load_model, entry
^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/usr/local/lib/python3.13/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/config/custom_components/llama_conversation/conversation.py", line 895, in _load_model
validate_llama_cpp_python_installation()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/config/custom_components/llama_conversation/utils.py", line 151, in validate_llama_cpp_python_installation
raise Exception(f"Failed to properly initialize llama-cpp-python. (Exit code {process.exitcode}.)")
Exception: Failed to properly initialize llama-cpp-python. (Exit code -4.)