Integration with LocalAI

mudler · May 25, 2023, 4:31pm

Hey everyone!

I think would be really awesome to see an integration with Home Assistant and LocalAI.

LocalAI (GitHub - go-skynet/LocalAI: Self-hosted, community-driven, local OpenAI-compatible API. Drop-in replacement for OpenAI running LLMs on consumer-grade hardware. No GPU required. LocalAI is a RESTful API to run ggml compatible models: llama.cpp, alpaca.cpp, gpt4all.cpp, rwkv.cpp, whisper.cpp, vicuna, koala, gpt4all-j, cerebras and many others!) is an OpenAI drop-in replacement API to allow to run LLM directly on consumer grade-hardware. No GPU, and no internet access is required.

There is already an OpenAI integration for home assistant, and as LocalAI is following the OpenAI spec, it should be already possible to integrate it. The only change required would be for the OpenAI plugin to also support specifying a “base url” to the user where to point requests to in the options.

What do you think?

Daft · May 28, 2023, 8:59pm

I would love to see a local AI running at my home.
If its only adding the base url it shoud be a quick win feature.

reptar · July 18, 2023, 1:41pm

Has anyone seen any news about any local LLMs being integrated into home assistant?

danielholm · July 26, 2023, 7:40pm

Any news on this? Would indeed be very cool to be able to run this locally with HA and get more non static voice messages from assist.

smj · July 27, 2023, 3:07am

This would be amazing!

Pe-MaKer · August 2, 2023, 9:16am

I’d love to have my local Language Model integrated into Home Assistant.

smj · August 7, 2023, 11:09pm

So, I had a quick hack on this last week and got LocalAI to work with Home Assistant.

I basically forked the "Open"AI integration and hacked up the code to add the ability to set a custom API endpoint.

I’ve found developing and testing with Home Assistant to be … painful … to say the least so there’s a bunch of problems with it at present, the main branch is currently broken - but I promise you - it did work for a while last week.

Very very early days and I have limited time at the moment but please feel free to submit PRs / fix bug or if you think it’s so dreadful it needs a rewrite (probably the case) please do so and I’ll use yours!

reptar · August 8, 2023, 7:48pm

Do you think this will work with oogabooga (GitHub - oobabooga/text-generation-webui: A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, OPT, and GALACTICA.)?

There’s an extension which basically mirrors the openapi calls so it should be a drop in replacement… theoretically it should work I think

smj · August 10, 2023, 8:18am

Yep! (I also have text-gen-webui running here, I just found localAI to be a bit better packaged).

Once I get some more time to work on it and get it going properly (hopefully this weekend) you’ll be able to provide any API endpoint you want, and as long as it provides an OpenAI compatible response it should work.

I’m already doing this with a few other apps, just haven’t had time to hack on it any further yet.

smj · August 10, 2023, 10:43pm

Better yet it seems that someone has got a working integration available now - GitHub - drndos/hass-openai-custom-conversation: Conversation support for home assistant using vicuna local llm

reptar · August 11, 2023, 1:57pm

Have you tested it out?
I’m about to give it a spin!

EDIT: I can’t seem to get that one working

EDIT2:
I figured out the issue.
home assistant does not let you connect to a non-encrypted endpoint if you’re using https for the home assistant instance.
I threw NPM as a proxy and it works correctly now

reptar · August 14, 2023, 2:55pm

Have you been able to increase the token size when using the custom openapi addon? I’ve tried increasing all the “Truncate the prompt up to this length” within the text-generation-webui interface but I still an error stating:
Sorry, I had a problem talking to Custom OpenAI compatible server: This model maximum context length is 2048 tokens.

drndos · August 16, 2023, 10:04pm

You set the context length within the configuration of your model.
This is how I do it:

Follow Easy Setup - GPU Docker :: LocalAI documentation
You will need quantized models with newer quantization method that have the ending with _k_s or _k_m or _k or _k_l
I open postman and create a POST request to http://localhost:8080/models/apply with raw body of JSON

{
     "id": "huggingface@thebloke__vicuna-13b-v1.5-ggml__vicuna-13b-v1.5.ggmlv3.q3_k_s.bin"
}

Wait for the model to download. Usually the model with more parameters is better. q4 is usually the sweet spot, but 13b q3 should perform better in terms of “quality” than 7b q4/q5. With windows running, I can fit in my RTX3060 with 12GB VRAM - thebloke__vicuna-7b-v1.5-ggml__vicuna-7b-v1.5.ggmlv3.q5_k_m.bin (even q8, but there is very little difference in terms of anything) and thebloke__vicuna-13b-v1.5-ggml__vicuna-13b-v1.5.ggmlv3.q3_k_s.bin
Change the model parameters in thebloke__vicuna-13b-v1.5-ggml__vicuna-13b-v1.5.ggmlv3.q3_k_s.bin.yaml:

backend: llama
context_size: 4000
f16: true 
gpu_layers: 43
low_vram: false
mmap: true
mmlock: true
batch: 512
name: thebloke__vicuna-13b-v1.5-ggml__vicuna-13b-v1.5.ggmlv3.q3_k_s.bin
parameters:
  model: vicuna-13b-v1.5.ggmlv3.q3_K_S.bin
  temperature: 0.2
  top_k: 80
  top_p: 0.7
roles:
  assistant: 'ASSISTANT:'
  system: ''
  user: 'USER:'
template:
  chat: vicuna-chat
  completion: vicuna-completion

you can also fix vicuna-chat.tmpl according to suggested prompt template TheBloke/vicuna-13B-v1.5-GGML · Hugging Face

Pay attention to parameter context_size, where you can customize the allowed context size of your model.
Also pay attention to gpu_layers. The more gpu_layers you deffer to GPU the faster it will go, but more GPU VRAM will be used.

Restart the localai
Configure your home assistant
Set the model name

thebloke__vicuna-7b-v1.5-ggml__vicuna-7b-v1.5.ggmlv3.q5_k_m.bin

You can improve the template like this:

This smart home is controlled by Home Assistant.

An overview of the areas and the devices in this smart home:
{%- for area in areas() %}
  {%- set area_info = namespace(printed=false) %}
  {%- for device in area_devices(area) -%}
    {%- if not device_attr(device, "disabled_by") and not device_attr(device, "entry_type") and device_attr(device, "name") %}
      {%- if not area_info.printed %}

{{ area_name(area) }}:
        {%- set area_info.printed = true %}
      {%- endif %}
      {%- for entity in device_entities(device) %}
        {%- if not is_state(entity,'unavailable') and not is_state(entity,'unknown') and not is_hidden_entity(entity) %}      
 - {{ state_attr(entity, 'friendly_name') }} is {{ states(entity) }}
        {%- endif %}
      {%- endfor %}
    {%- endif %}
  {%- endfor %}
{%- endfor %}

Answer the user's questions about the world truthfully.

Next step would be configuring the functions so the LLM can actually call your home assistant services and control your home.

reptar · August 17, 2023, 2:48pm

Thanks for sharing this, I’ll need to test using the localai vs the text-generation-webui that ive been using

drndos · August 18, 2023, 3:33pm

If you are using llama.cpp backend in text-generation-webui you can set the context size by defining n_ctx parameter before loading the model

reptar · August 18, 2023, 5:22pm

bare with me as I’m still pretty new to the LLM stuff.
I’m using an AutoGPTQ model (TheBloke/airoboros-33B-GPT4-m2.0-GPTQ · Hugging Face).

I’m assuming it cannot be done with that one, so I’ll try the model you’ve shared

drndos · August 19, 2023, 12:47pm

Yeah, some models only support 2048 tokens sized context unfortunately, so even if you increase the setting, it might not work.

reptar · August 19, 2023, 6:36pm

So I’m using your model and selected llamma.cpp as the loader and increased n_ctx to 6144 then loaded the model but I’m still receiving the same error in home assistant.

I’ll probably test with localai later

edurenye · August 23, 2023, 2:02pm

LocalAI could be used as a fallback for the Assist pipelines: Pipeline chaining or fallback intent

But maybe should be something more generic than LocalAI, so you could use privateGPT or anything else.
Like maybe just use the already implemented Wyoming protocol integration.

What do you think?

reptar · August 23, 2023, 10:52pm

So I’ve got this set up using localai now and am no longer getting the error about context length.

I’ve set my vicuna-chat.tmpl to this:

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

USER: {prompt}
ASSISTANT:

And I’ve set the template within home assistant to this:

This smart home is controlled by Home Assistant.

An overview of the areas and the devices in this smart home:
{%- for area in areas() %}
  {%- set area_info = namespace(printed=false) %}
  {%- for device in area_devices(area) -%}
    {%- if not device_attr(device, "disabled_by") and not device_attr(device, "entry_type") and device_attr(device, "name") %}
      {%- if not area_info.printed %}

{{ area_name(area) }}:
        {%- set area_info.printed = true %}
      {%- endif %}
      {%- for entity in device_entities(device) %}
        {%- if not is_state(entity,'unavailable') and not is_state(entity,'unknown') and not is_hidden_entity(entity) %}      
 - {{ state_attr(entity, 'friendly_name') }} is {{ states(entity) }}
        {%- endif %}
      {%- endfor %}
    {%- endif %}
  {%- endfor %}
{%- endfor %}

Answer the user's questions about the world truthfully.

but am receiving this response whenever I try to interact with the LLM within home assistant:

```vbnet # Prompt /imagine prompt: [Your Prompt], [Your Prompt], [Your Prompt], [Your Prompt] --ar 16:9 --v 5 ```

Not really sure what’s going on here but I think the prompt template is messed up (even though I applied the one directly from huggingface)