Integration with LocalAI

You set the context length within the configuration of your model.
This is how I do it:

  1. Follow Easy Setup - GPU Docker :: LocalAI documentation
  2. You will need quantized models with newer quantization method that have the ending with _k_s or _k_m or _k or _k_l
  3. I open postman and create a POST request to http://localhost:8080/models/apply with raw body of JSON
{
     "id": "huggingface@thebloke__vicuna-13b-v1.5-ggml__vicuna-13b-v1.5.ggmlv3.q3_k_s.bin"
}
  1. Wait for the model to download. Usually the model with more parameters is better. q4 is usually the sweet spot, but 13b q3 should perform better in terms of “quality” than 7b q4/q5. With windows running, I can fit in my RTX3060 with 12GB VRAM - thebloke__vicuna-7b-v1.5-ggml__vicuna-7b-v1.5.ggmlv3.q5_k_m.bin (even q8, but there is very little difference in terms of anything) and thebloke__vicuna-13b-v1.5-ggml__vicuna-13b-v1.5.ggmlv3.q3_k_s.bin
  2. Change the model parameters in thebloke__vicuna-13b-v1.5-ggml__vicuna-13b-v1.5.ggmlv3.q3_k_s.bin.yaml:
backend: llama
context_size: 4000
f16: true 
gpu_layers: 43
low_vram: false
mmap: true
mmlock: true
batch: 512
name: thebloke__vicuna-13b-v1.5-ggml__vicuna-13b-v1.5.ggmlv3.q3_k_s.bin
parameters:
  model: vicuna-13b-v1.5.ggmlv3.q3_K_S.bin
  temperature: 0.2
  top_k: 80
  top_p: 0.7
roles:
  assistant: 'ASSISTANT:'
  system: ''
  user: 'USER:'
template:
  chat: vicuna-chat
  completion: vicuna-completion

you can also fix vicuna-chat.tmpl according to suggested prompt template TheBloke/vicuna-13B-v1.5-GGML · Hugging Face

Pay attention to parameter context_size, where you can customize the allowed context size of your model.
Also pay attention to gpu_layers. The more gpu_layers you deffer to GPU the faster it will go, but more GPU VRAM will be used.

  1. Restart the localai
  2. Configure your home assistant
    Set the model name
thebloke__vicuna-7b-v1.5-ggml__vicuna-7b-v1.5.ggmlv3.q5_k_m.bin

You can improve the template like this:

This smart home is controlled by Home Assistant.

An overview of the areas and the devices in this smart home:
{%- for area in areas() %}
  {%- set area_info = namespace(printed=false) %}
  {%- for device in area_devices(area) -%}
    {%- if not device_attr(device, "disabled_by") and not device_attr(device, "entry_type") and device_attr(device, "name") %}
      {%- if not area_info.printed %}

{{ area_name(area) }}:
        {%- set area_info.printed = true %}
      {%- endif %}
      {%- for entity in device_entities(device) %}
        {%- if not is_state(entity,'unavailable') and not is_state(entity,'unknown') and not is_hidden_entity(entity) %}      
 - {{ state_attr(entity, 'friendly_name') }} is {{ states(entity) }}
        {%- endif %}
      {%- endfor %}
    {%- endif %}
  {%- endfor %}
{%- endfor %}

Answer the user's questions about the world truthfully.

Next step would be configuring the functions so the LLM can actually call your home assistant services and control your home.

2 Likes