Local LLM for dummies

AshaiRey · September 10, 2024, 4:13pm

Yes, this guide is for dummies like me that want to run a local LLM but due to the large number of new technologies/articles/names everywhere I didn’t know where to start. I must say that development is going very fast and what will be written here could be old school tech tomorrow.
This guide is intended to get your local LLM up and running asap before that happens.

I have tried to get this working on Ubuntu but after a week digging through so many tutorials and typing commands in a terminal I gave up and started to look for a Windows solution. 4 hours later I had my local LLM up and running. Please note this has nothing to due with the OS but it says more about me being a dummy.

But here is a quick guide to get you started on Windows/Linux/Mac

What will you need?
Software

HACS installed
LM Studio (choose the windows version. Linux/Mac are also available)
Local LLM conversation - Follow these steps

Hardware
A computer with a GPU that have at least 8Gb vram. CPU isn’t that important.

Information
My setup
*I tested this on a remote system running Ryzen 7, 32Gb ram, RTX 3060TI with 8Gb vram. *
My context length is at 5400 tokens due to many exposed entities (80). This slows things down a bit. But complete pipeline takes between 2-5 seconds for a answer.

Where to install?
This can be installed om a separate machine in your network or if possible on the same machine that HA runs on.

Installation
Download and install LM Studio
After starting LM Studio you need a LLM model to play with.
Download the suggested model (Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf)
When the download finish load the model and your ready to go. that’s all

Information
Check the small cog next to the model name field and see if all layers are loaded into the GPU’s vram. Running in ram is possible but it will be much, much slower.

At this moment this will be a chatbot to play with.
On the right you have a box called System prompt
Here you can describe the personality of the chatbot, how it should respond and and what it limits are.
Here is an example prompt
You are the smartest chatbot around and you translate all your responses to Dutch. Give answer in the style of Glados. Don’t hold back on irritating answers.

(Have fun with this one)

Information
*To control entities in HA you need a model that can do somethings called Function calling, This will create a response in a form that other applications understand. Often this is in the form of JSON with the openAI schema. The model mentioned here supports Function calling. *
LM Studio has a model search ability and it will search Huggingface If you go to that site directly you have better search options.

Go to LM Studio
Click on the Developer icon on the left
If you have a remote PC then turn Serve to local Network ON otherwise leave it OFF for running on localhost.
Now click the Start Server button.

If you followed the setup instructions you have now also installed Local LLM Conservation in HA and connected the Whisper and Piper pipeline together. Now you have a working system.

Information
The basic actions such as Turn the lights on and What is on in the kitchen will work. Other command may or may not work. These can be added via the system prompt. Another way to get things working was adding it to the config options of Local LLM Conversation and adding an “additional attribute to expose in the context”. For example for de todo lists I’ve added an entry called shopping_list

Troubleshooting
Debugging the pipeline

Command processing errors
In LM Studio, the developer tap.
Turn on all the logging and see what is happening. Most common errors are that the prompt contains more tokens then defined in Context Length. One place to change that is via the cog next to the model name. In My Model you can set this per model as defaults. The log will show you how any tokens the prompt had.

Time out problems
The config options of Local LLM Conversation has an option called ‘Remote Request Timeout’. This has a default of 90 seconds. So if something goes wrong then you have to wait 90 seconds before you get any response. I got this at 6 seconds and the ‘Max tokens to return in response’ at 128 tokens

AshaiRey · September 11, 2024, 1:39pm

Some progress on getting more control and more speed

Reducing the System prompt length
If you look at the System prompt you see a Tools place holder

You are 'Nabu', a helpful AI Assistant that controls the devices in a house and make all replies in Dutch. If you do not understand the question then do not answer. Complete the following task as instructed with the information provided only.
The current time and date is {{ (as_timestamp(now()) | timestamp_custom("%I:%M %p on %A %B %d, %Y", "")) }}

Tools: {{ tools | to_json }}

Devices:

{{ tools | to_json }} will generate a big chuck of the prompt filled with API info to make function calls. To see what is actually send to the LLM you have to look in the log of LM Studio.
Now for me, I have quiet a few custom intents that I don’t want to expose but they still show up here. So if I replace {{ tools | to_json }} with the part found in the log between Tools: and Devices: things will still be working but now I have control over which function calls will be active or even add my own.

Each function has its own schema and is described with the {“type”:“function”,“function”: tag and end before the next one

Example
{"type":"function","function":{"name":"HassStartTimer","description":"Starts a new timer","parameters":{"type":"object","properties":{"hours":{"type":"integer","description":""},"minutes":{"type":"integer","description":""},"seconds":{"type":"integer","description":""},"name":{"type":"string","description":""},"conversation_command":{"type":"string","description":""}},"required":[]}}}

By removing function calls that are not needed you make the system prompt shorter and as a result the response faster.

Here is an example to test your function call if you want to add one where I try to add water to my shopping list.

Open assist (via Overview/Assistant in the right top corner) and enter the following in the request box.

<functioncall> {\"name\":\"HassListAddItem\",\"arguments\":{\"name\":\"todo.shopping_list\"}:{\"item\":\"water\"}}"

maglat · December 28, 2024, 7:11pm

Very interesting. Could you pls share your prompt ore a litte part of it? I am not 100% sure what you meant by leaving {{ tools | to_json }} away and replace by the raw code received by LM Studio. So maybe share your actuall prompt with one light entity? Many thanks in advance!

AshaiRey · December 28, 2024, 8:41pm

May I answer this later in more detail because I am on a business trip right now and I do not have access to the PC running the LLM.

maglat · December 28, 2024, 8:42pm

Off course! Many thanks already and have a good trip and save way back home

AshaiRey · December 28, 2024, 9:00pm

I remembered a conversagion I had months a go with someone that asked fir the prompt also. I managed to find it.
I was the first baduc prompt I used and I believe I copied it from somewhere on this forum too

Sorry for the lack of formatting.
I on my phone right now and can’t see any way to add formatting.

You are 'Nabu', a helpful AI Assistant that controls the devices in a house and replies in Dutch only. Complete the following task as instructed with the information provided only. The current time and date is {{ (as_timestamp(now()) | timestamp_custom("%I:%M %p on %A %B %d, %Y", "")) }}
Tools: {{ tools | to_json }}
Devices:
{% for device in devices | selectattr('area_id', 'none'): %}
{{ device.entity_id }} '{{ device.name }}' = {{ device.state }}{{ ([""] + device.attributes) | join(";") }}
{% endfor %}
{% for area in devices | rejectattr('area_id', 'none') | groupby('area_name') %}
## Area: {{ area.grouper }}
{% for device in area.list %}
{{ device.entity_id }} '{{ device.name }}' = {{ device.state }};{{ device.attributes | join(";") }}
{% endfor %}
{% endfor %}
{% for item in response_examples %}
{{ item.request }}
{{ item.response }}
<functioncall> {{ item.tool | to_json }}
{% endfor %}

Pops1 · January 13, 2025, 12:37am

LM Studio has worked great for me. Especially now that home assistant intents are the first option for a command. I came across this thread after I had already had LM Studio set up and working; this thread has helped me improve performance.

There are other programs that provide LLM api connections to Home Assistant, but I haven’t had great luck with other LLMs other than gpt-mini-4o. I am also disappointed that I need to run LM Studio on a windows machine. I have had no luck installing it on an Ubuntu server, which would have been my preference. However, installing LM Studio on a linux server requires sacrificing a chicken while standing on your head (it is unnecessarily obtuse); That said, I am happy LM Studio works well in Windows. It removes the need to use ChatGPT.

AshaiRey · January 13, 2025, 1:22pm

Thanks that you like it.
The latest release of LMStudio also supports tools and as said it accepts all the chatGPT format. Now there is hardly any reason anymore to not going local.

Aaron_Toulmin · January 17, 2025, 5:38am

How would this run on a base m4 Mac mini. Would it be suitable for ChatGPT replacement for home assistant.

AshaiRey · January 17, 2025, 3:27pm

I have no idea but you just can try.
It won’t hurt your machine and it is easily removed again.