Local LLM Conversation Integration

acon96 · January 18, 2024, 4:42am

I wanted to share the custom component I’ve been working on for running local AI models as conversation agents. It is all done locally via the Llama.cpp library which allows for running models on your CPU and even runs on a Raspberry Pi (if you can find a small enough model that is). The goal was to have a solution that allowed me to experiment with different local AI models that can be quantized to run on actual Home Assistant hardware in the near future. As the project has progressed, there is now support for remote backends and running much larger models on a GPU.

Features:

Run models locally using Llama.cpp as part of Home Assistant or by connecting to Ollama, llama-cpp-python server, or text-generation-webui
Output parsing to execute Home Assistant services using JSON function calling
A provided example model that is fine tuned to work with the extension
Supports models fine-tuned on the provided dataset as well as non-fine-tuned models via In-Context-Learning examples

Out of the box support for:

Llama 3
Mistral/Mixtral Instruct
Command R

The dataset is also provided if others want to fine tune their own models to work with this extension.

Installation instructions are on the GitHub page and it should be installable via HACS.

Simone77 · May 3, 2024, 6:56am

Where are the downloaded models stored please ? Apparently those are not automatically removed after removing the integration.

acon96 · May 4, 2024, 2:26pm

The default storage location is /media/models/ if it was downloaded from HuggingFace using the integration.

rojoricardo · June 27, 2024, 6:29pm

Hello, first of all thanks for all the work put up in this integration.
Second, I just installed the integration, I have Ollama running on a windows computer with 12Gb VRAM Nvidia GPU. I can connect to the server, and it talks back using Assist in HA. However when asked to interact with my devices it says that the device has been turned on or off but nothing happens as fiscally the device does not change its state

kuligs2 · August 5, 2024, 2:25pm

Is there a way to trigger the ollama if the predefined sentences dont get triggered? Like if i have already predefined sentences for automation, if they dont get triggered then it falls back on to ollama to find a solution?

AwesomeGuyNamedMatt · December 29, 2024, 1:13am

I’m trying to get this working. I run HA (version 2024.12.5) using the Home Assistant OS on a virtual machine running on my Proxmox Cluster. My HA does not have a GPU and I have given it 6GB of Ram.

I’ve installed the Local LLM integration from HACS.
When adding the integration, I choose Llama.cpp (HuggingFace).
The model I’ve chosen is acon96/Home-3B-GGUF
I can see the model files downloaded in /media/models
When I go to configure the voice assistant, I do not have the local LLM option to select for the conversation agent. The logs indicate that the conversation platform is not launching. Any suggestions on what I should try?

Logger: homeassistant.components.conversation
Source: helpers/entity_platform.py:366
integration: Conversation (documentation, issues)
First occurred: 6:48:31 PM (2 occurrences)
Last logged: 6:51:44 PM

Error while setting up llama_conversation platform for conversation
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 366, in _async_setup_platform
    await asyncio.shield(awaitable)
  File "/config/custom_components/llama_conversation/conversation.py", line 179, in async_setup_entry
    await agent._async_load_model(entry)
  File "/config/custom_components/llama_conversation/conversation.py", line 282, in _async_load_model
    return await self.hass.async_add_executor_job(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        self._load_model, entry
        ^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/local/lib/python3.13/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/llama_conversation/conversation.py", line 895, in _load_model
    validate_llama_cpp_python_installation()
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/config/custom_components/llama_conversation/utils.py", line 151, in validate_llama_cpp_python_installation
    raise Exception(f"Failed to properly initialize llama-cpp-python. (Exit code {process.exitcode}.)")
Exception: Failed to properly initialize llama-cpp-python. (Exit code -4.)

Damianeex · March 10, 2025, 1:20pm

Did you solve problem? I have exactly the same. I downloaded file for llama-cpp-python , but nothing had happened.

howels · September 12, 2025, 10:16am

Trying to connect a remote llama.cpp model using Vulkan (AMD GPU on Linux and not willing to install rocm). The integration can detect the available models on the remote system but then throws 404 later on, tried with OpenAI compatible and the remote llama integrations.