AI Voice Control for Home Assistant (Fully Local)

I have setup a relatively fast, fully local, AI voice assistant for Home Assistant.

The following components are used:


4-11-2024 (20-15-15)


Update

I also got my AMD 6900XT GPU working with llama-cpp-python on my Windows PC, which can perform function calling around 3 seconds! Let me know if you need help with installing llama-cpp-(python) for ROCm on Windows.


Credits
  • FutureProofHomes for making Functionary work with Extended OpenAI and llama-cpp-python.
  • Min Jekal for creating the Extended OpenAI integration!

The Story

I want to quickly update the community with the possibilities in AI, Voice Control and Home Assistant. I am exploring the possibilities of running a fully local Voice Assistant in my home for quite a while now.

I know the majority of HA users run their instance on a small piece of hardware without much compute capability, this post is NOT for those users! My Home Assistant instance is running as a Docker container on an old PC that is now an ubuntu server. I recently upgraded this PC with a Nvidia GTX 1080 GPU (around €100) to achieve the following:

  • Run a local LLM (AI) model that is completely offloaded into my GPU’s VRAM.
  • Run local SST with whisper on my GPU with the large-v3-in8 model.
Further read

The local SST using whisper is far off Google’s SST performance, it was therefore annoying to use it with the default Assist of Home Assistant, since this requires precise intents. Especially in Dutch, it is very hard to always get the precise intent output by whisper, and some words are often replaced by others (it feels like overkill to make a wildcard for these words). I therefore focused on using AI, so that you don’t have to memorize any voice commands and it all feels more natural.

To my knowledge, there are two HACS integrations that support AI function calling as of now:

  • Home-LLM: more focused on smaller HA (CPU only) setups and uses a relatively small LLM (3B parameters) that is trained on a custom Home Assistant Request dataset. However, it is also possible to train and use your own LLM.
  • Extended OpenAI: an extension of the OpenAI integration in HA, that supports function calling with the GPT3.5/4 models (and other models that supports function calling via OpenAI’s API).

Then, there are multiple ways of setting up your own local LLM:

I first used a combination of LocalAI and Home-LLM and used my own custom trained model on a Dutch translated version of the training set from Home-LLM. I used Unsloth to train the Mistral 7B model using this Google Colab It worked quite well for some functions (e.g. light brightness), but it is still far from a real AI experience. The largest downside of this integration is that you need to train the model for each function call, so its not easy to add a feature.

I have now settled on llama-cpp-python and Extended OpenAI. I came across this YouTube video from FutureProofHomes and his journey in making a dedicated local AI-Powered Voice assistant. It’s not exactly what I am looking for, since his dedicated hardware restrictions make the AI very slow. However, all credits go to FutureProofHomes for pointing me in this direction. Normally, Extended OpenAI is only supported with the GPT models that support function calling, so most models that you can run locally do not work. But now, there is this model called Functionary that you are able to run locally and provides even better function calling than the GPT models! Do note that the chit-chatting with this model is never as good as GPT. Some modifications in the source code of Extended OpenAI and llama-cpp-python were necessary to have this combination working.

It can all easily be made faster if you want to invest in it. As for now, it seems that its best to buy a GPU with as much VRAM as possible and the highest CUDA compute capability. I might buy a RTX 3060 (12GB) or RTX 3090 (24GB) in the future! I was also able to run KoboldCPP on my desktop PC with my AMD Radeon 6900XT.

See below the guide with all the code to get llama-cpp-python / Extended OpenAI / Functionary working together. Also let me know if you have any tips or suggestions in local AI Voice Assistants. Would love to hear alternatives and benchmarks of the processing time of other GPUs.


Installation Guide

This guide is specifically written for installing a local LLM Voice Assistant using Docker containers on a setup with a Nvidia GPU (CUDA) and Ubuntu 22.04. Since we are building our own Docker images, you might have to change a few things dependent on your setup.

Prerequisites:

  • Linux distribution: one that is supported by Nvidia Container Toolkit
  • Docker container engine installed
  • Nvidia GPU (including CUDA drivers), check your maximum supported CUDA version by running the command $ nvidia-smi
  • Nvidia Container Tookit: to be able to run Docker containers on CUDA, follow this installation guide.

Wyoming Faster Whisper

You can use this repository to build the wyoming-faster-whisper Docker container that runs on CUDA.

  • Clone the repository and navigate into it:
    $ git clone https://github.com/BramNH/wyoming-faster-whisper-docker-cuda
    $ cd wyoming-faster-whisper-docker-cuda

  • Because my maximum supported CUDA version = 12.2, in Dockerfile, I am using the following image to include the CUDA environment in the built image:
    FROM nvidia/cuda:12.0.1-cudnn8-runtime-ubuntu22.04
    Faster Whisper requires the cudnn8 and runtime from CUDA. You might need another image based on your CUDA version and Linux distribution.

  • Build the image:
    $ docker build --tag wyoming-whisper .

  • Edit the container configuration in compose.yml to specify which model to run. For example: --model ellisd/faster-whisper-large-v3-int8 --language nl

  • Start the container with Docker Compose:
    $ docker compose up -d

Llama-cpp-python

We setup llama-cpp-python to specifically work in combination with the Functionary LLM. Because llama-cpp did not work out of the box with Functionary, a few changes had to be made in llama-types.py from the llama-cpp-python directory. During the Docker image build, the modified file is copied into the the pythons dependency folder of llama-cpp. This is a temporary solution and might be changed or fixed in the future.

  • Clone the repository to get the necessary files to build and run the Docker container, then navigate into the folder:
    $ git clone https://github.com/BramNH/llama-cpp-python-docker-cuda
    $ cd llama-cpp-python-docker-cuda

  • Llama-cpp requires the devel CUDA image for GPU support, so I import the following image in Dockerfile. You might have to change this to your CUDA version / Linux distribution:
    FROM nvidia/cuda:12.1.1-devel-ubuntu22.04

  • Build the Docker image with the included Dockerfile:
    $ docker build --tag llama-cpp-python .

  • You can run the container using the included compose.yml:
    $ docker compose up -d

Extended OpenAI

The Extended OpenAI HACS integration will talk to the OpenAI API that is used by llama-cpp-python. There were also some modifications necessary to get the HACS integration working with Functionary and llama-cpp-python, see this discussion. You can either re-install the HACS integration using my fork of Extended OpenAI, or replace the __init__.py file within the /custom_components/extended_openai_conversation

__init__.py
"""The OpenAI Conversation integration."""

from __future__ import annotations

import json
import logging
from typing import Literal

from openai import AsyncAzureOpenAI, AsyncOpenAI
from openai._exceptions import AuthenticationError, OpenAIError
from openai.types.chat.chat_completion import (
    ChatCompletion,
    ChatCompletionMessage,
    Choice,
)
import yaml

from homeassistant.components import conversation
from homeassistant.components.homeassistant.exposed_entities import async_should_expose
from homeassistant.config_entries import ConfigEntry
from homeassistant.const import ATTR_NAME, CONF_API_KEY, MATCH_ALL
from homeassistant.core import HomeAssistant
from homeassistant.exceptions import (
    ConfigEntryNotReady,
    HomeAssistantError,
    TemplateError,
)
from homeassistant.helpers import (
    config_validation as cv,
    entity_registry as er,
    intent,
    template,
)
from homeassistant.helpers.typing import ConfigType
from homeassistant.util import ulid

from .const import (
    CONF_API_VERSION,
    CONF_ATTACH_USERNAME,
    CONF_BASE_URL,
    CONF_CHAT_MODEL,
    CONF_CONTEXT_THRESHOLD,
    CONF_CONTEXT_TRUNCATE_STRATEGY,
    CONF_FUNCTIONS,
    CONF_MAX_FUNCTION_CALLS_PER_CONVERSATION,
    CONF_MAX_TOKENS,
    CONF_ORGANIZATION,
    CONF_PROMPT,
    CONF_SKIP_AUTHENTICATION,
    CONF_TEMPERATURE,
    CONF_TOP_P,
    CONF_USE_TOOLS,
    DEFAULT_ATTACH_USERNAME,
    DEFAULT_CHAT_MODEL,
    DEFAULT_CONF_FUNCTIONS,
    DEFAULT_CONTEXT_THRESHOLD,
    DEFAULT_CONTEXT_TRUNCATE_STRATEGY,
    DEFAULT_MAX_FUNCTION_CALLS_PER_CONVERSATION,
    DEFAULT_MAX_TOKENS,
    DEFAULT_PROMPT,
    DEFAULT_SKIP_AUTHENTICATION,
    DEFAULT_TEMPERATURE,
    DEFAULT_TOP_P,
    DEFAULT_USE_TOOLS,
    DOMAIN,
    EVENT_CONVERSATION_FINISHED,
)
from .exceptions import (
    FunctionLoadFailed,
    FunctionNotFound,
    InvalidFunction,
    ParseArgumentsFailed,
    TokenLengthExceededError,
)
from .helpers import (
    get_function_executor,
    is_azure,
    validate_authentication,
)
from .services import async_setup_services

_LOGGER = logging.getLogger(__name__)

CONFIG_SCHEMA = cv.config_entry_only_config_schema(DOMAIN)


# hass.data key for agent.
DATA_AGENT = "agent"


async def async_setup(hass: HomeAssistant, config: ConfigType) -> bool:
    """Set up OpenAI Conversation."""
    await async_setup_services(hass, config)
    return True


async def async_setup_entry(hass: HomeAssistant, entry: ConfigEntry) -> bool:
    """Set up OpenAI Conversation from a config entry."""

    try:
        await validate_authentication(
            hass=hass,
            api_key=entry.data[CONF_API_KEY],
            base_url=entry.data.get(CONF_BASE_URL),
            api_version=entry.data.get(CONF_API_VERSION),
            organization=entry.data.get(CONF_ORGANIZATION),
            skip_authentication=entry.data.get(
                CONF_SKIP_AUTHENTICATION, DEFAULT_SKIP_AUTHENTICATION
            ),
        )
    except AuthenticationError as err:
        _LOGGER.error("Invalid API key: %s", err)
        return False
    except OpenAIError as err:
        raise ConfigEntryNotReady(err) from err

    agent = OpenAIAgent(hass, entry)

    data = hass.data.setdefault(DOMAIN, {}).setdefault(entry.entry_id, {})
    data[CONF_API_KEY] = entry.data[CONF_API_KEY]
    data[DATA_AGENT] = agent

    conversation.async_set_agent(hass, entry, agent)
    return True


async def async_unload_entry(hass: HomeAssistant, entry: ConfigEntry) -> bool:
    """Unload OpenAI."""
    hass.data[DOMAIN].pop(entry.entry_id)
    conversation.async_unset_agent(hass, entry)
    return True


class OpenAIAgent(conversation.AbstractConversationAgent):
    """OpenAI conversation agent."""

    def __init__(self, hass: HomeAssistant, entry: ConfigEntry) -> None:
        """Initialize the agent."""
        self.hass = hass
        self.entry = entry
        self.history: dict[str, list[dict]] = {}
        base_url = entry.data.get(CONF_BASE_URL)
        if is_azure(base_url):
            self.client = AsyncAzureOpenAI(
                api_key=entry.data[CONF_API_KEY],
                azure_endpoint=base_url,
                api_version=entry.data.get(CONF_API_VERSION),
                organization=entry.data.get(CONF_ORGANIZATION),
            )
        else:
            self.client = AsyncOpenAI(
                api_key=entry.data[CONF_API_KEY],
                base_url=base_url,
                organization=entry.data.get(CONF_ORGANIZATION),
            )

    @property
    def supported_languages(self) -> list[str] | Literal["*"]:
        """Return a list of supported languages."""
        return MATCH_ALL

    async def async_process(
        self, user_input: conversation.ConversationInput
    ) -> conversation.ConversationResult:
        exposed_entities = self.get_exposed_entities()

        if user_input.conversation_id in self.history:
            conversation_id = user_input.conversation_id
        else:
            conversation_id = ulid.ulid()
            user_input.conversation_id = conversation_id
            try:
                system_message = self._generate_system_message(
                    exposed_entities, user_input
                )
            except TemplateError as err:
                _LOGGER.error("Error rendering prompt: %s", err)
                intent_response = intent.IntentResponse(language=user_input.language)
                intent_response.async_set_error(
                    intent.IntentResponseErrorCode.UNKNOWN,
                    f"Sorry, I had a problem with my template: {err}",
                )
                return conversation.ConversationResult(
                    response=intent_response, conversation_id=conversation_id
                )
            messages = [system_message]
        user_message = {"role": "user", "content": user_input.text}
        if self.entry.options.get(CONF_ATTACH_USERNAME, DEFAULT_ATTACH_USERNAME):
            user = await self.hass.auth.async_get_user(user_input.context.user_id)
            if user is not None and user.name is not None:
                user_message[ATTR_NAME] = user.name

        messages.append(user_message)

        try:
            query_response = await self.query(user_input, messages, exposed_entities, 0)
        except OpenAIError as err:
            _LOGGER.error(err)
            intent_response = intent.IntentResponse(language=user_input.language)
            intent_response.async_set_error(
                intent.IntentResponseErrorCode.UNKNOWN,
                f"Sorry, I had a problem talking to OpenAI: {err}",
            )
            return conversation.ConversationResult(
                response=intent_response, conversation_id=conversation_id
            )
        except HomeAssistantError as err:
            _LOGGER.error(err, exc_info=err)
            intent_response = intent.IntentResponse(language=user_input.language)
            intent_response.async_set_error(
                intent.IntentResponseErrorCode.UNKNOWN,
                f"Something went wrong: {err}",
            )
            return conversation.ConversationResult(
                response=intent_response, conversation_id=conversation_id
            )

        messages.append(query_response.message.model_dump())

        self.hass.bus.async_fire(
            EVENT_CONVERSATION_FINISHED,
            {
                "response": query_response.response.model_dump(),
                "user_input": user_input,
                "messages": messages,
            },
        )

        intent_response = intent.IntentResponse(language=user_input.language)
        intent_response.async_set_speech(query_response.message.content)
        return conversation.ConversationResult(
            response=intent_response, conversation_id=conversation_id
        )

    def _generate_system_message(
        self, exposed_entities, user_input: conversation.ConversationInput
    ):
        raw_prompt = self.entry.options.get(CONF_PROMPT, DEFAULT_PROMPT)
        prompt = self._async_generate_prompt(raw_prompt, exposed_entities, user_input)
        return {"role": "system", "content": prompt}

    def _async_generate_prompt(
        self,
        raw_prompt: str,
        exposed_entities,
        user_input: conversation.ConversationInput,
    ) -> str:
        """Generate a prompt for the user."""
        return template.Template(raw_prompt, self.hass).async_render(
            {
                "ha_name": self.hass.config.location_name,
                "exposed_entities": exposed_entities,
                "current_device_id": user_input.device_id,
            },
            parse_result=False,
        )

    def get_exposed_entities(self):
        states = [
            state
            for state in self.hass.states.async_all()
            if async_should_expose(self.hass, conversation.DOMAIN, state.entity_id)
        ]
        entity_registry = er.async_get(self.hass)
        exposed_entities = []
        for state in states:
            entity_id = state.entity_id
            entity = entity_registry.async_get(entity_id)

            aliases = []
            if entity and entity.aliases:
                aliases = entity.aliases

            exposed_entities.append(
                {
                    "entity_id": entity_id,
                    "name": state.name,
                    "state": self.hass.states.get(entity_id).state,
                    "aliases": aliases,
                }
            )
        return exposed_entities

    def get_functions(self):
        try:
            function = self.entry.options.get(CONF_FUNCTIONS)
            result = yaml.safe_load(function) if function else DEFAULT_CONF_FUNCTIONS
            if result:
                for setting in result:
                    function_executor = get_function_executor(
                        setting["function"]["type"]
                    )
                    setting["function"] = function_executor.to_arguments(
                        setting["function"]
                    )
            return result
        except (InvalidFunction, FunctionNotFound) as e:
            raise e
        except:
            raise FunctionLoadFailed()

    async def truncate_message_history(
        self, messages, exposed_entities, user_input: conversation.ConversationInput
    ):
        """Truncate message history."""
        strategy = self.entry.options.get(
            CONF_CONTEXT_TRUNCATE_STRATEGY, DEFAULT_CONTEXT_TRUNCATE_STRATEGY
        )

        if strategy == "clear":
            last_user_message_index = None
            for i in reversed(range(len(messages))):
                if messages[i]["role"] == "user":
                    last_user_message_index = i
                    break

            if last_user_message_index is not None:
                del messages[1:last_user_message_index]
                # refresh system prompt when all messages are deleted
                messages[0] = self._generate_system_message(
                    exposed_entities, user_input
                )

    async def query(
        self,
        user_input: conversation.ConversationInput,
        messages,
        exposed_entities,
        n_requests,
    ) -> OpenAIQueryResponse:
        """Process a sentence."""
        model = self.entry.options.get(CONF_CHAT_MODEL, DEFAULT_CHAT_MODEL)
        max_tokens = self.entry.options.get(CONF_MAX_TOKENS, DEFAULT_MAX_TOKENS)
        top_p = self.entry.options.get(CONF_TOP_P, DEFAULT_TOP_P)
        temperature = self.entry.options.get(CONF_TEMPERATURE, DEFAULT_TEMPERATURE)
        use_tools = self.entry.options.get(CONF_USE_TOOLS, DEFAULT_USE_TOOLS)
        context_threshold = self.entry.options.get(
            CONF_CONTEXT_THRESHOLD, DEFAULT_CONTEXT_THRESHOLD
        )
        functions = list(map(lambda s: s["spec"], self.get_functions()))
        function_call = "auto"
        if n_requests == self.entry.options.get(
            CONF_MAX_FUNCTION_CALLS_PER_CONVERSATION,
            DEFAULT_MAX_FUNCTION_CALLS_PER_CONVERSATION,
        ):
            function_call = "none"

        tool_kwargs = {"functions": functions, "function_call": function_call}
        if use_tools:
            tool_kwargs = {
                "tools": [{"type": "function", "function": func} for func in functions],
                "tool_choice": function_call,
            }

        if len(functions) == 0:
            tool_kwargs = {}

        _LOGGER.info("Prompt for %s: %s", model, messages)

        response: ChatCompletion = await self.client.chat.completions.create(
            model=model,
            messages=messages,
            max_tokens=max_tokens,
            top_p=top_p,
            temperature=temperature,
            user=user_input.conversation_id,
            **tool_kwargs,
        )

        _LOGGER.info("Response %s", response.model_dump(exclude_none=True))

        if response.usage.total_tokens > context_threshold:
            await self.truncate_message_history(messages, exposed_entities, user_input)

        choice: Choice = response.choices[0]
        message = choice.message

        if choice.finish_reason == "function_call":
            return await self.execute_function_call(
                user_input, messages, message, exposed_entities, n_requests + 1
            )
        if choice.finish_reason == "tool_calls":
            return await self.execute_tool_calls(
                user_input, messages, message, exposed_entities, n_requests + 1
            )
        if choice.finish_reason == "length":
            raise TokenLengthExceededError(response.usage.completion_tokens)

        return OpenAIQueryResponse(response=response, message=message)

    async def execute_function_call(
        self,
        user_input: conversation.ConversationInput,
        messages,
        message: ChatCompletionMessage,
        exposed_entities,
        n_requests,
    ) -> OpenAIQueryResponse:
        function_name = message.function_call.name.strip()
        function = next(
            (s for s in self.get_functions() if s["spec"]["name"] == function_name),
            None,
        )
        if function is not None:
            return await self.execute_function(
                user_input,
                messages,
                message,
                exposed_entities,
                n_requests,
                function,
            )
        raise FunctionNotFound(function_name)

    async def execute_function(
        self,
        user_input: conversation.ConversationInput,
        messages,
        message: ChatCompletionMessage,
        exposed_entities,
        n_requests,
        function,
    ) -> OpenAIQueryResponse:
        function_executor = get_function_executor(function["function"]["type"])

        try:
            arguments = json.loads(message.function_call.arguments)
        except json.decoder.JSONDecodeError as err:
            raise ParseArgumentsFailed(message.function_call.arguments) from err

        result = await function_executor.execute(
            self.hass, function["function"], arguments, user_input, exposed_entities
        )

        messages.append(
            {
                "role": "function",
                "name": message.function_call.name,
                "content": str(result),
            }
        )
        return await self.query(user_input, messages, exposed_entities, n_requests)

    async def execute_tool_calls(
        self,
        user_input: conversation.ConversationInput,
        messages,
        message: ChatCompletionMessage,
        exposed_entities,
        n_requests,
    ) -> OpenAIQueryResponse:
        messages.append(message.model_dump())
        for tool in message.tool_calls:
            function_name = tool.function.name.strip()
            function = next(
                (s for s in self.get_functions() if s["spec"]["name"] == function_name),
                None,
            )
            if function is not None:
                result = await self.execute_tool_function(
                    user_input,
                    tool,
                    exposed_entities,
                    function,
                )

            else:
                raise FunctionNotFound(function_name)
        return await self.query(user_input, messages, exposed_entities, n_requests)

    async def execute_tool_function(
        self,
        user_input: conversation.ConversationInput,
        tool,
        exposed_entities,
        function,
    ) -> OpenAIQueryResponse:
        function_executor = get_function_executor(function["function"]["type"])

        try:
            arguments = json.loads(tool.function.arguments)
        except json.decoder.JSONDecodeError as err:
            raise ParseArgumentsFailed(tool.function.arguments) from err

        result = await function_executor.execute(
            self.hass, function["function"], arguments, user_input, exposed_entities
        )
        return result


class OpenAIQueryResponse:
    """OpenAI query response value object."""

    def __init__(
        self, response: ChatCompletion, message: ChatCompletionMessage
    ) -> None:
        """Initialize OpenAI query response value object."""
        self.response = response
        self.message = message

Follow the guide of Extended OpenAI how to create your own functions that the LLM can call.

Important settings when using Functionary LLM:

  • Enable Use Tools, If you defined your own functions;
  • Context Threshold = 8000, messages are cleared after 8k, otherwise model gets confused after threshold;
7 Likes

I would definitely be interested in how you did it.

Yes I would appreciate a write up as well

Yes please

very interesting!

Alright! Guide coming soon.

1 Like

@BramNH would love to see your specific configs. Before finding this post in these forums I actually already had found and been trying your branch of the extended openai integration. But no matter what I try, the function calling is not working. What changes did you have to make to llama-cpp-python? that might be the only thing different. Thanks!

Many thanks for writing it up. Yes please continue on sharing your experiments.

I added a guide with links to the modifications I made. Let me know if anything is unclear!

Thanks for the updates @BramNH

I was able to get it working without errors now, which is fantastic.

Curious - what exact model are you using? I am using: “functionary-small-v2.4.f16.gguf”

Curious also what your prompt is like? The reason I ask is that it just seems pretty… rather dumb to me.

It often seems to use an incomplete entity_id, for example instead of light.office_light, it just wants to use office_light. So I have tried to give it direction in the prompt to help that.

But in other cases it will adjust the wrong device, or claim that a device doesnt support changing colors, and other weird things. About 20% of the time, it just doesnt even make a service call, it just says it did whatever I wanted it to do, and it doesnt even try.

Maybe the default prompt is far from comprehensive enough? Would love to hear your thoguhts! Thanks!

Great to hear it works! I am using the functionary-small-v2.4.Q4_0.gguf model.

I am using the default prompt atm!

I have had that entity naming issue before as well, when I wrote my own yaml functions. I think it was fixed by explicitly stating the following in execute_services function: description: The entity_id retrieved from available devices. Call the service with this entity_id, you must add the domain (such as light or switch) in front of it, followed by dot character.

I noticed that every description is very important for the LLMs decision making in function calls.

All my Extended OpenAI Functions
- spec:
    name: execute_services
    description: Use this function to execute service of devices in Home Assistant.
    parameters:
      type: object
      properties:
        list:
          type: array
          items:
            type: object
            properties:
              domain:
                type: string
                description: The domain of the service
              service:
                type: string
                description: The service to be called
              service_data:
                type: object
                description: The service data object to indicate what to control.
                properties:
                  entity_id:
                    type: string
                    description: The entity_id retrieved from available devices. Call the service
                      with this entity_id, you must add the domain (such as light or switch) in front of it, followed by dot character.
                required:
                - entity_id
            required:
            - domain
            - service
            - service_data
  function:
    type: native
    name: execute_service
- spec:
    name: get_attributes
    description: Get attributes of any home assistant entity
    parameters:
      type: object
      properties:
        entity_id:
          type: string
          description: entity_id
      required:
      - entity_id
  function:
    type: template
    value_template: "{{states[entity_id]}}"
- spec:
    name: get_weather_info
    description: Get info and forecast weather info
    parameters:
      type: object
      properties:
        location:
          type: string
          description: Infer this from home location, (e.g. Den Ham, OV)
        format:
          enum:
            - current forecast
            - daily forecast
          description: The type of weather forecast information to search for and use.
      required:
      - location
      - format
  function:
    type: template
    value_template: "{{states['weather.buienradar']}}"
- spec:
    name: add_item_to_todo_list
    description: Add item to to-do list
    parameters:
      type: object
      properties:
        item:
          type: string
          description: The item to be added to to-do list
        list_id:
          type: string
          descript: The entity id of the to-do list
      required:
      - item
      - list_id
  function:
    type: script
    sequence:
    - service: todo.add_item
      data:
        item: '{{item}}'
      target:
        entity_id: todo.{{list_id}}
- spec:
    name: set_climate_temperature
    description: Sets the target temperature of the altherma thermostat
    parameters:
      type: object
      properties:
        temperature:
          type: string
          description: The target temperature
      required:
      - temperature
  function:
    type: script
    sequence:
    - service: climate.set_temperature
      data:
        temperature: '{{temperature}}'
      target:
        entity_id: climate.altherma_thermostaat
- spec:
    name: set_light_brightness
    description: Sets a brightness value for a light entity. Only call this
      function when the user explicitly gives you a percentage value.
    parameters:
      type: object
      properties:
        brightness:
          type: string
          description: The brightness percentage to set.
        entity_id:
          type: string
          description: The light entity_id retrieved from available devices. 
            It must start with the light domain, followed by dot character.
      required:
      - brightness
      - entity_id
  function:
    type: script
    sequence:
    - service: light.turn_on
      data:
        brightness_pct: '{{brightness}}'
      target:
        entity_id: '{{entity_id}}'
- spec:
    name: set_light_warm
    description: Sets a light entity to its warmest temperature.
    parameters:
      type: object
      properties:
        entity_id:
          type: string
          description: The light entity_id retrieved from available devices. 
            It must start with the light domain, followed by dot character.
      required:
      - entity_id
  function:
    type: script
    sequence:
    - service: light.turn_on
      data:
        kelvin: '{{state_attr(entity_id, "min_color_temp_kelvin")}}'
      target:
        entity_id: '{{entity_id}}'
- spec:
    name: set_light_cold
    description: Sets a light entity to its coldest or coolest temperature,
      only call this function when user explicitly asks for cold or cool temperature of the light.
    parameters:
      type: object
      properties:
        entity_id:
          type: string
          description: The light entity_id retrieved from available devices. 
            It must start with the light domain, followed by dot character.
      required:
      - entity_id
  function:
    type: script
    sequence:
    - service: light.turn_on
      data:
        kelvin: '{{state_attr(entity_id, "max_color_temp_kelvin")}}'
      target:
        entity_id: '{{entity_id}}'
- spec:
    name: set_light_temperature
    description: Sets a temperature value in Kelvin for a light entity, 
      only call this function when an explicit Kelvin value has been provided.
    parameters:
      type: object
      properties:
        temperature:
          type: string
          description: The temperature to set in Kelvin
        entity_id:
          type: string
          description: The entity_id retrieved from available devices. 
            It must start with domain, followed by dot character.
      required:
      - temperature
      - entity_id
  function:
    type: script
    sequence:
    - service: light.turn_on
      data:
        kelvin: '{{temperature}}'
      target:
        entity_id: '{{entity_id}}'
- spec:
    name: set_light_color
    description: Sets a color value for a light entity. Only call this function
      when the user explicitly gives a color, and not warm, cold or cool.
    parameters:
      type: object
      properties:
        color:
          type: string
          description: The color to set
        entity_id:
          type: string
          description: The light entity_id retrieved from available devices. 
            It must start with the light domain, followed by dot character.
      required:
      - color
      - entity_id
  function:
    type: script
    sequence:
    - service: light.turn_on
      data:
        color_name: '{{color}}'
      target:
        entity_id: '{{entity_id}}'
- spec:
    name: start_app_tv
    description: Starts an app on the Sony Bravia TV. Only call this
      function when the user explicitly gives the name of an app. Use
      the function named execute_services, for turning the tv on or off.  
    parameters:
      type: object
      properties:
        app_name:
          type: string
          description: The name of the app to start
      required:
      - app_name
  function:
    type: script
    sequence:
    - service: media_player.play_media
      data:
        media_content_id: '{{app_name}}'
        media_content_type: "app"
      target:
        entity_id: media_player.sony_kd_43xg8399

Really appreciate all the details. Adding a specific function as you do for setting the color helped. But for some reason its still randomly just dropping the domain from entities, so it works like 50% of the time. I have instructions as you do exactly now in the function calls, and have been experimenting with re-enforcing that above, surely seems odd, but to be expected with bleeding edge I suppose :slight_smile:

I also renamed most of my entities. Since the domain is already at the front, doesnt seem logical to include it in the id. I can’t say for sure if it will help.

E.g. change light.office_lights to light.office

I would not recommend redirecting the model via the prompt for everything, this only makes the context larger and the model slower.

Btw: what GPU do you use? And can you share your benchmarks (Natural Language Processing times, inference tokens / s) for certain commands? Would love to make a table in the future to compare performances.

Excellent Guide @BramNH

You know what I liked about the whole thing ? Functionary Small 2.4 Q4 only requires 4.11 GB of RAM. That means I can run both Wyoming whisper (based on faster whisper) and LLM on a 8 GB GPU like Nvidia GTX 1070 or 1080. Wow !

I have been looking at using TheBloke/Luna-AI-Llama2-Uncensored-GGUF · Hugging Face. luna-ai-llama2-uncensored.Q6_K.gguf Q6_K model requires 8.03 GB (RAM Required) but is of good quality -
very large, extremely low quality loss (use case). This means I need atleast 10-12 GB GPU to run both this and Wyoming Whisper ! I got aware of this LLM from this guide at How to control Home Assistant with a local LLM instead of ChatGPT | The awesome garage where he mentioned a Nvidia RTX 3060 for running both LLM and whisper !

My question - how is Functionary Small 2.4 Q4 performance and accuracy ? Is is something enough to run a AI voice assistant (English language) instead on relying of larger models like luna-ai-llama2-uncensored.Q6_K.gguf ?

Thanks @vishworks! I indeed tried to fit both wyoming-faster-whisper and the LLM in 8GB of VRAM. In idle, the model for faster-whisper (large-v3-int8) uses about 1.4GB I believe, so I still have some VRAM left in total. I did get an out of memory error in Whisper under load one time, so it might be better to use the medium-int8 model (uses about 1GB).

The RTX 3060 is also a candidate for my next GPU, since it’s not that expensive and has a 12GB VRAM variant. Plus ofcourse faster than my GTX 1080.

The Functionary Small V2.4 model is based on the Mistral 7B model. In my opinion, if you use this model for Home Assistant Function calling, the accuracy is of the Q4 variant is good, but it mostly depends on how you prompt the model and how you describe your functions. I have the entire prompt and all function definitions in English, but provide the model with Dutch sentences, and it still performs good! Let me know what your experience is if you try it out!

2 Likes

I’ve got this error when asking for a weather. Any advice?
Sorry, I had a problem talking to `

OpenAI: Error code: 500 - {‘error’: {‘message’: ‘[{'type': 'literal_error', 'loc': ('body', 'messages', 2, 'typed-dict', 'role'), 'msg': “Input should be 'system'”, 'input': 'assistant', 'ctx': {'expected': “'system'”}}, {'type': 'literal_error', 'loc': ('body', 'messages', 2, 'typed-dict', 'role'), 'msg': “Input should be 'user'”, 'input': 'assistant', 'ctx': {'expected': “'user'”}}, {'type': 'dict_type', 'loc': ('body', 'messages', 2, 'typed-dict', 'function_call'), 'msg': 'Input should be a valid dictionary', 'input': None}, {'type': 'literal_error', 'loc': ('body', 'messages', 2, 'typed-dict', 'role'), 'msg': “Input should be 'tool'”, 'input': 'assistant', 'ctx': {'expected': “'tool'”}}, {'type': 'missing', 'loc': ('body', 'messages', 2, 'typed-dict', 'tool_call_id'), 'msg': 'Field required', 'input': {'content': None, 'role': 'assistant', 'function_call': None, 'tool_calls': [{'id': 'call_PISwU8mGaTWpQr6KCSZi9dY8', 'function': {'arguments': '{“entity_id”: “weather.forecast_home”}', 'name': 'get_attributes'}, 'type': 'function'}]}}, {'type': 'literal_error', 'loc': ('body', 'messages', 2, 'typed-dict', 'role'), 'msg': “Input should be 'function'”, 'input': 'assistant', 'ctx': {'expected': “'function'”}}, {'type': 'missing', 'loc': ('body', 'messages', 2, 'typed-dict', 'name'), 'msg': 'Field required', 'input': {'content': None, 'role': 'assistant', 'function_call': None, 'tool_calls': [{'id': 'call_PISwU8mGaTWpQr6KCSZi9dY8', 'function': {'arguments': '{“entity_id”: “weather.forecast_home”}', 'name': 'get_attributes'}, 'type': 'function'}]}}]’, ‘type’: ‘internal_server_error’, ‘param’: None, ‘code’: None}}

`

Seems like a prompt template issue, since it expects “system” where “assistant” is given as input etc.

Do other commands work?

When I try to turn off music (named biuro). It stops it, but still response is like this:

How can I assist?

Turn off biuro

Sorry, I had a problem talking to OpenAI: Error code: 500 - {'error': {'message': '[{\'type\': \'literal_error\', \'loc\': (\'body\', \'messages\', 2, \'typed-dict\', \'role\'), \'msg\': "Input should be \'system\'", \'input\': \'assistant\', \'ctx\': {\'expected\': "\'system\'"}}, {\'type\': \'literal_error\', \'loc\': (\'body\', \'messages\', 2, \'typed-dict\', \'role\'), \'msg\': "Input should be \'user\'", \'input\': \'assistant\', \'ctx\': {\'expected\': "\'user\'"}}, {\'type\': \'dict_type\', \'loc\': (\'body\', \'messages\', 2, \'typed-dict\', \'function_call\'), \'msg\': \'Input should be a valid dictionary\', \'input\': None}, {\'type\': \'literal_error\', \'loc\': (\'body\', \'messages\', 2, \'typed-dict\', \'role\'), \'msg\': "Input should be \'tool\'", \'input\': \'assistant\', \'ctx\': {\'expected\': "\'tool\'"}}, {\'type\': \'missing\', \'loc\': (\'body\', \'messages\', 2, \'typed-dict\', \'tool_call_id\'), \'msg\': \'Field required\', \'input\': {\'content\': None, \'role\': \'assistant\', \'function_call\': None, \'tool_calls\': [{\'id\': \'call_3XSYhREfhUzvc9deaiBq057H\', \'function\': {\'arguments\': \'{"list": [{"domain": "media_player", "service": "turn_off", "service_data": {"entity_id": "media_player.biuro"}}]}\', \'name\': \'execute_services\'}, \'type\': \'function\'}]}}, {\'type\': \'literal_error\', \'loc\': (\'body\', \'messages\', 2, \'typed-dict\', \'role\'), \'msg\': "Input should be \'function\'", \'input\': \'assistant\', \'ctx\': {\'expected\': "\'function\'"}}, {\'type\': \'missing\', \'loc\': (\'body\', \'messages\', 2, \'typed-dict\', \'name\'), \'msg\': \'Field required\', \'input\': {\'content\': None, \'role\': \'assistant\', \'function_call\': None, \'tool_calls\': [{\'id\': \'call_3XSYhREfhUzvc9deaiBq057H\', \'function\': {\'arguments\': \'{"list": [{"domain": "media_player", "service": "turn_off", "service_data": {"entity_id": "media_player.biuro"}}]}\', \'name\': \'execute_services\'}, \'type\': \'function\'}]}}]', 'type': 'internal_server_error', 'param': None, 'code': None}}

My functions in ext. open ai are defaults:

- spec:
    name: execute_services
    description: Use this function to execute service of devices in Home Assistant.
    parameters:
      type: object
      properties:
        list:
          type: array
          items:
            type: object
            properties:
              domain:
                type: string
                description: The domain of the service
              service:
                type: string
                description: The service to be called
              service_data:
                type: object
                description: The service data object to indicate what to control.
                properties:
                  entity_id:
                    type: string
                    description: The entity_id retrieved from available devices. Call the service with this entity_id, you must add the domain (such as light or switch) in front of it, followed by dot character.
                required:
                - entity_id
            required:
            - domain
            - service
            - service_data
  function:
    type: native
    name: execute_service
- spec:
    name: get_attributes
    description: Get attributes of any home assistant entity
    parameters:
      type: object
      properties:
        entity_id:
          type: string
          description: entity_id
      required:
      - entity_id
  function:
    type: template
    value_template: "{{states[entity_id]}}"

Also Tool is On and User name is On.

It seems that llama got the proper prompt format. Still torch missing seems not good - I will try to install torch in docker but I think this is not the reason.

Also regarding inference speeds and whisper.
I personally use whisper.cpp and wyoming-whisper-api-client:
https://github.com/ser/wyoming-whisper-api-client
I use large-v3 model and it has perfect perfomance. I use rtx3090 for this.
All one/two sentence talk is translated in less than a second.
It is ideal for home assistant. I tried many alternatives, like wyoming faster whisper but I got slower responses.

I think that whisper.cpp + (some llm) + piper is great combo for HA.
For satellites I tried many things and the best I got is onju-voice.
This works quite well with connection directly to homeassistant wyoming-openwakeword but still the wyoming-openwakeword has some issues with overdetection (I tried my own python code with openwakeword and it didn’t have this issues at all but the integration with homeassistant is beyound my skills - I tried to ask HA devs to just use the library version of openwakeword in wyoming-openwakeword but no response).
I tried few TTS’s like e.g. xtts,bark etc… These are much better than piper in terms of quality but much slower. Piper is good enough but doesn’t have high quality voice in my language (polish). In future I will try to train the voice but for now the most important part of the puzzle that is missing is LLM.

For the LLM’s i tried HomeLLM and ext.open ai. Actually in open source space there is nothing that works locally for now.
HomeLLM works partially, extended open ai works partially but only with openais chat models.
I thought that the best option would be to have something like crewai integrated into homeassistant just like ext.open ai. There could be two models in it: first for conversation with user and the second for function calling. The best function calling model I tried is functionary. I tried tens of different models for this task and every one has failed in some extent (maybe naturalfunction wasn’t that bad).
But still for my native language I would like to use model that has good ability to speak(bielik model), but it has poor function calling. That’s why for function calling would best to use functionary but CrewAi doesn’t work properly with functionary for now. So that’s why your topic is quite interesting as another small puzzle in this field :slight_smile:

Regarding my issue with using your fork:

I think the problem is with extended open ai:

homeassistant  | 2024-05-01 14:17:25.845 INFO (MainThread) [custom_components.extended_openai_conversation] Prompt for functionary-v2.4: [{'role': 'system', 'content': "I want you to act as smart home manager of Home Assistant.\nI will provide information of smart home along with a question, you will truthfully make correction or answer using information provided in one sentence in everyday language.\n\nCurrent Time: 2024-05-01 14:17:25.842311+02:00\n\nAvailable Devices:\n```csv\nentity_id,name,state,aliases\nsensor.epson_wf_3620_series_black_ink,EPSON WF-3620 Series Black ink,65,\nmedia_player.biuro,Biuro,playing,biuro/głośnik aleksa\nweather.forecast_home,Pogoda,sunny,weather forecast\nswitch.onju_voice_477b00_use_wake_word,Voice1 Use Wake Word,unavailable,\nswitch.zigbee2mqtt_bridge_permit_join,Permit join,unavailable,\n```\n\nThe current state of devices is provided in available devices.\nUse execute_services function only for requested action, not for current states.\nDo not execute service without user's confirmation.\nDo not restate or appreciate what user says, rather make a quick inquiry."}, {'role': 'user', 'content': 'What is the weather?'}]
homeassistant  | 2024-05-01 14:17:28.199 INFO (MainThread) [custom_components.extended_openai_conversation] Response {'id': 'chatcmpl-80cb712f-871f-4216-aaad-a7c459b32141', 'choices': [{'finish_reason': 'tool_calls', 'index': 0, 'message': {'role': 'assistant', 'tool_calls': [{'id': 'call_QKcGpqWv32B4KqtYJZhPNvr4', 'function': {'arguments': '{"entity_id": "weather.forecast_home"}', 'name': 'get_attributes'}, 'type': 'function'}]}}], 'created': 1714565848, 'model': 'functionary-v2.4', 'object': 'chat.completion', 'usage': {'completion_tokens': 21, 'prompt_tokens': 527, 'total_tokens': 528}}

I installed your fork from extended openai from HACS. How did you manage to overcome this?