Ollama + HA and offloading all the work like NetworkChuck -- but it doesn't work. :(

SpikeyGG · November 5, 2024, 3:34am

Today, I watched this video by NetworkChuck (apparently it was released today!) He uses a more powerful machine on his network to host Piper, Whisper, and most importantly Ollama so he can use his GPU to crunch the LLM stuff.

I was excited to try this out and have my own local AI running to mimic Alexa and Okay Google. However, when I stood up the Ollama and tried to do some simple questions through the HA Assist (typing input) interface, it responds with seemingly randomness and sometimes starts playing music on one of my speakers… what??

For example,

Or something simple like this:

WTH?

I must be missing something super simple but I can’t seem to see where I’m going wrong. Anyone else get Ollama to work properly? I installed it via docker and I can see the logs are getting the queries…

2024-11-05T03:25:21.100908060Z [GIN] 2024/11/05 - 03:25:21 | 200 |  4.540550683s |     172.27.10.5 | POST     "/api/chat"
2024-11-05T03:25:16.773057973Z INFO [update_slots] input truncated | n_ctx=8192 n_erase=4346 n_keep=4 n_left=8188 n_shift=4094 tid="129803580563456" timestamp=1730777116
2024-11-05T03:25:16.553978229Z [GIN] 2024/11/05 - 03:25:16 | 200 |   4.37069796s |     172.27.10.5 | POST     "/api/chat"
2024-11-05T03:25:12.402321268Z INFO [update_slots] input truncated | n_ctx=8192 n_erase=5498 n_keep=4 n_left=8188 n_shift=4094 tid="129803580563456" timestamp=1730777112
2024-11-05T03:24:20.838960991Z [GIN] 2024/11/05 - 03:24:20 | 200 |  3.595154272s |     172.27.10.5 | POST     "/api/chat"
2024-11-05T03:15:05.449992028Z [GIN] 2024/11/05 - 03:15:05 | 200 |  5.524266349s |     172.27.10.5 | POST     "/api/chat"
2024-11-05T03:15:00.149390194Z INFO [update_slots] input truncated | n_ctx=8192 n_erase=4329 n_keep=4 n_left=8188 n_shift=4094 tid="129803580563456" timestamp=1730776500
2024-11-05T03:14:55.892899971Z INFO [update_slots] input truncated | n_ctx=8192 n_erase=5497 n_keep=4 n_left=8188 n_shift=4094 tid="129803580563456" timestamp=1730776495
2024-11-05T03:14:59.904272605Z [GIN] 2024/11/05 - 03:14:59 | 200 |  4.240327274s |     172.27.10.5 | POST     "/api/chat"
2024-11-05T03:14:24.169667050Z [GIN] 2024/11/05 - 03:14:24 | 200 |  3.916306488s |     172.27.10.5 | POST     "/api/chat"
2024-11-05T03:14:20.458771771Z INFO [update_slots] input truncated | n_ctx=8192 n_erase=4428 n_keep=4 n_left=8188 n_shift=4094 tid="129803580563456" timestamp=1730776460
2024-11-05T03:14:20.241229115Z [GIN] 2024/11/05 - 03:14:20 | 200 | 34.105772211s |     172.27.10.5 | POST     "/api/chat"
2024-11-05T03:14:18.057897696Z time=2024-11-05T03:14:18.057Z level=INFO source=server.go:626 msg="llama runner started in 30.39 seconds"
2024-11-05T03:14:17.813914010Z INFO [main] model loaded | tid="129803580563456" timestamp=1730776457

I’ve tried the llama3.2 that NetworkChuck uses and also llama3.1 which seemed to be the default before I loaded 3.2 manually. Both of them seem to respond with weirdness or what looks like an error on simple questions.

When I flip the Voice Assistant conversation agent back to the standard Home Assistant (removing Ollama), I get meaningful responses, provided the context & syntax is on-point.

SpikeyGG · November 5, 2024, 4:15am

BAH!! I figured something out.

I noticed that the Ollama worked well when I disabled the “Control Home Assistant” option:

With that set to Assist, it’s almost like there are too many controls and it overloads the Ollama thing – then the output is gibberish.

With it set to No control, it can answer standard questions quickly and easily but it cannot do things to my HA instance.

I’m guessing this is because my Home Assistant has too many switches and entities and whatnot exposed to the Voice Assistant.

I see under that setting it has a context window size that defaulted to 8192 and it says:

Maximum number of text tokens the model can process. Lower to reduce Ollama RAM, or increase for a large number of exposed entities.

I think this is what I need to tweak (or remove a bunch of exposed entities). Is there any guidance on how to figure out if I’m close or how many entities I can fit in 8192 or what the limit is?

I tried making it 32768 and the query took a lot longer but did give me the time eventually. It feels like that’s going to have to be a balance of speed of query vs. number of exposed entities but I have no understanding of how to tune it.

SpikeyGG · November 5, 2024, 4:31am

Okay, messing around with it further, I found where my exposed entities list is. I had something like 250+ exposed entities!! I went through those and cut it back to 64 but the problem still persists.

Even if I bump up the Context window size up to 16384 and allow Assist it cannot properly answer questions. However, as soon as I set it to No Control the thing works GREAT but cannot perform any actions to my house.

SpikeyGG · November 6, 2024, 3:01pm

I think I got it working, after shaving down the # of entities and aligning my expectations. It seems very inconsistent and gives results that aren’t expected:

Me: What time is it?
Ollama: Unfortunately, I don’t have real-time access to the current time. However, I can suggest checking your Home Assistant system’s display or a clock app on your device for the current time.
Alternatively, if you want to know what time it was when the Home Assistant system was last updated, that information is not available through this interface.
Me: What is the time?
Ollama: I’m sorry, I made a mistake again. It seems that the tool “HassGetState” is not available to retrieve the current time.
As a workaround, you can check your Home Assistant system’s display or a clock app on your device for the current time. The current date and time are 2024-11-06 07:51:01.

I have also had it actually just return the time correctly for both of these questions previously.

Here’s another example of an interpret/suggestion and a positive direction but both failed:

Me: It’s dark in the office, can you make it a little bit brighter?
Ollama: I’m sorry, I made a mistake. The Living Room Light is not in the Office area. To make it brighter in the Office, you would need to control a different device. Can you please specify which light switch or light you want me to turn up? Alternatively, if you want to brighten up the entire Office area, can you please confirm that there is only one device of type “light” present in the Office area?
Me: the office light
Ollama: I made a mistake again. The Office light is currently turned off, so turning it up to 100% brightness wouldn’t be possible. Can you please specify a different position or percentage value that you would like me to try?

I think it’s just that this kind of AI is too new. It answered all these questions but didn’t actually change any switches.

I have had it start playing music on the Office speakers when asking it about things that have nothing to do with that (like “what time is it?” or “when did the movie the matrix come out?”).

voltie · November 7, 2024, 2:59pm

Hi.
I just watched the NetworkChuck video and I think he did actually give us a fix for this, very briefly in the video. I am not at all surprised if you didn’t catch it, but you can hear the following dialogue at ca. 23:25:

You also… kind of tell it who it is
actually I like to add cuz it might not answer you… it might not answer questions you ask
you say ‘You can also answer any other question.’

And you see him typing that sentence 'You can also answer any other question." into the “Instructions” input. So, if you go to “Voice Assistants”, go in to configure the voice assistant you set up, under “Conversation agent”, you click the settings (gear) icon, you can find this “Instructions” input.

This change in the prompt suggests to the overchatty chatterbot that it is fine to answer outside of the scope of all the hardware info it has access to. So basically it’s just been fed a lot of info about your smart home, and when you ask ‘What time is it?’ it will first try to answer in the context of all the hardware info.

Allow me to test it here.

Before editing the Instructions prompt:

How can I assist?
What time is it?
I don’t have any function that can provide the current time.

After editing the Instructions prompt:

What time is it?
It’s 15:58.

SpikeyGG · November 7, 2024, 3:25pm

Thanks for the response @voltie but I put that line in the instructions to the conversation agent originally when setting it up. I still get quite inconsistent results.

In fact, I can ask it the same question over and over and it will give me errors about not being able to access things and then give me the correct answer subsequently. I can continue to ask and it will give me errors again. So, something seems off with it.

voltie · November 7, 2024, 3:41pm

Yeah… It’s probably less and less consistent the more info (hardware) it has been provided. I am sure somebody can give better clues at shaping the prompts.

We’re not yet at a stable enough Alexa replacement. I was using Mistral for a while, since that was what some tutorials used. I asked it to turn on the office lights and instead it turned on all the lights in the house. Which was not very popular at 3 AM.

SpikeyGG · November 7, 2024, 5:07pm

Yes, it’s disappointing because I watched NetworkChuck and it seemed like everything worked great for him during his video! I thought to myself: “this is great if it works like that!” And I was really excited to try it out. NetworkChuck did not say anything about how inconsistent it is during that video and left everyone to believe that it works great, which it does not.

In my eyes, NetworkChuck lost a lot of cred for this AI video “spin”. I really look forward to the day when this actually works like you would want it to.

DocKoala992 · November 7, 2024, 7:38pm

Hey all!
I watched the same video, but i could not get that far…
Im running HA on a local VM , for some stupid reason i cant do the Ollama integration, keeps getting error.
Any idea why?

Tried it all already, with http, or just the ip, ip+port, the 0.0.0.0 option too…
My IP is correct and accurate as other devices reach it normally, just cant do it with Ollama…

SpikeyGG · November 7, 2024, 7:59pm

@DocKoala992 Did you try going to http://192.168.1.20 to see if you get the “Ollama is ready” message? That would at least tell you that the network is working and the service is running…

DocKoala992 · November 7, 2024, 8:12pm

Hello, yeap i went, and its running. going to old methods now…
to see what gives lol

DocKoala992 · November 7, 2024, 8:25pm

Fresh install done, added to firewall excepcions and defender too, still wont connect to HA - but it works wonders in cmd and browser …
Ill wait to see if anyone has any tips

Thanks

mwardj · November 7, 2024, 8:34pm

Did you edit your windows system variable? I had to do this like Chuck did in his video since Ollama runs on 127.0.0.1 by default, go into windows environment variables, add a System variable (not user)@
variable name: OLLAMA_HOST
variable value: 0.0.0.0

Restart Ollama and it should be available on your network (assuming firewall is allowed in too)

DocKoala992 · November 7, 2024, 8:37pm

Hello, yes the change in the env was done to the exact data as in the video still no result, i even tried to connect on HA with localhost link, to go directly, and with the 0.0.0.0 and nope to both

DocKoala992 · November 8, 2024, 9:46am

Good morning, so i checked the app.log and im not sure, but i still see the default IP there, maybe thats the issue?
But im sure that i made it 0.0.0.0 - as it also states in the log

time=2024-11-08T10:43:12.453+01:00 level=INFO source=logging.go:50 msg="ollama app started"
time=2024-11-08T10:43:12.454+01:00 level=INFO source=lifecycle.go:19 msg="app config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\DocKoala\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-11-08T10:43:12.507+01:00 level=INFO source=server.go:182 msg="unable to connect to server"
time=2024-11-08T10:43:12.507+01:00 level=INFO source=server.go:141 msg="starting server..."
time=2024-11-08T10:43:12.511+01:00 level=INFO source=server.go:127 msg="started ollama server with pid 4492"
time=2024-11-08T10:43:12.511+01:00 level=INFO source=server.go:129 msg="ollama server logs C:\\Users\\DocKoala\\AppData\\Local\\Ollama\\server.log"

BaitingHollow · November 23, 2024, 10:15pm

Greetings,
Just my journey on this and maybe it will help others. Yup I also followed network chuck. I set up a HA on a test pc, Dell Wyse 5060 so as not to blow out the real HA running the home. I did the Rpi4 setup with a highhat (never got the LEDs to work though) .I then followed and got it working using the instructions. Ollama worked at the cmd line on the WinPc and HA attached to it and worked also. Everything else after that failed. No whisper and no piper channels. I tried everything and my ultimate frustration is/was using wsl in windows.

So I used an old laptop (great for experimenting) and installed debian, and docker compose. I found some sites that got me to create the right yml file to work. Wow. HA attached to ollama and piper and whisper.

So my conclusion is that using Windows wsl is the real headache. Better to go with a stand along debian/linux system.

bbrendon · November 25, 2024, 7:29am

I have fallen for this way more times than I care to admit. Lots of content out there makes things appear far better than they actually are.

EDIT: My main issue with videos that make things appear to work when they don’t is that they cause a lot of ewaste. Even videos released by Home Assistant/Nabu Casa are guilty of this.

BaitingHollow · November 25, 2024, 9:57pm

Yes, I too have experimented with stuff on YT and it failed to deliver. But I always learn something. It’s how I’m growing more familiar with linux, networking and such stuff. And for me, learning about self hosting AI is a real eye opener. So as Chief Dan George said in the movie ‘Little Big Man’, “I shall endeavor to persevere” .

RubenRybnik88 · December 8, 2024, 7:56pm

[WSL 2] Exposing ollama via 0.0.0.0 on local network · Issue #1431 · ollama/ollama · GitHub This may help others that find this post. It worked for me!