Running Ollama locally gives error. works fine with Google Gemini

just got my local Ollama up and running on a PC, i think. for a moment it was working fine. i still see it on my Windows 10 machine.


here’s my yaml

action: llmvision.image_analyzer
data:
  remember: false
  include_filename: false
  target_width: 1920
  max_tokens: 3000
  provider: 01KECVF7CB7FT8VMYCS56C19AY
  message: the needle is pointing at what number?
  image_entity:
    - camera.192_168_1_121_2
  model: qwen3-vl:2b


this is a snapshot of my house furnace temperature gauge for the heater in the basement. Google Gemini has no problem reading the gauge and outputting a number.

So I thought i can run qwen3-vl:2b model on Ollama locally on my i7 (7th gen intel). it works only once. this error comes up, from LLM Vision, no matter what i do. any idea why?

HA’s log:

Is the Ollama integration set up as Conversation Agent or AI Task?

Also, I think you want llmvision.data_analyzer

This error says the issue is on your ollama side. What’s the log output there also GPU type and Vram amount…

with data analyzer, it seems worse:

i dont know why it wants this model gemma3:4b log:

in LLM config, i set the model to be qwen…

how to know if i set mine up as Conversation Agent or AI task? i dont see that option anywhere while adding the LLM Vision integration.

@NathanCu this is some of the Ollama’s server.log file. im running on cpu, no gpu in this ancient HP with 32ram, and i7 (7th gen cpu)

time=2026-01-08T21:53:21.736-05:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-01-08T21:53:21.737-05:00 level=INFO source=cpu_windows.go:195 msg=“” package=0 cores=4 efficiency=0 threads=8
time=2026-01-08T21:53:21.842-05:00 level=INFO source=server.go:245 msg=“enabling flash attention”
time=2026-01-08T21:53:21.845-05:00 level=INFO source=server.go:429 msg=“starting runner” cmd=“C:\Users\tung\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --model C:\Users\tung\.ollama\models\blobs\sha256-ed12a4674d727a74ac4816c906094ea9d3119fbea46ca93288c3ce4ffbe38c55 --port 64874”
time=2026-01-08T21:53:21.884-05:00 level=INFO source=sched.go:443 msg=“system memory” total=“31.9 GiB” free=“21.8 GiB” free_swap=“24.2 GiB”
time=2026-01-08T21:53:21.884-05:00 level=INFO source=server.go:746 msg=“loading model” “model layers”=37 requested=-1
time=2026-01-08T21:53:21.928-05:00 level=INFO source=runner.go:1405 msg=“starting ollama engine”
time=2026-01-08T21:53:21.929-05:00 level=INFO source=runner.go:1440 msg=“Server listening on 127.0.0.1:64874”
time=2026-01-08T21:53:21.940-05:00 level=INFO source=runner.go:1278 msg=load request=“{Operation:fit LoraPath: Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:2048 KvCacheType: NumThreads:4 GPULayers: MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}”
time=2026-01-08T21:53:21.981-05:00 level=INFO source=ggml.go:136 msg=“” architecture=qwen3vl file_type=Q4_K_M name=“” description=“” num_tensors=858 num_key_values=40
load_backend: loaded CPU backend from C:\Users\tung\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2026-01-08T21:53:22.311-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2026-01-08T21:53:22.710-05:00 level=INFO source=runner.go:1278 msg=load request=“{Operation:alloc LoraPath: Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:2048 KvCacheType: NumThreads:4 GPULayers: MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}”
time=2026-01-08T21:53:23.165-05:00 level=INFO source=runner.go:1278 msg=load request=“{Operation:commit LoraPath: Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:2048 KvCacheType: NumThreads:4 GPULayers: MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}”
time=2026-01-08T21:53:23.165-05:00 level=INFO source=ggml.go:482 msg=“offloading 0 repeating layers to GPU”
time=2026-01-08T21:53:23.165-05:00 level=INFO source=ggml.go:486 msg=“offloading output layer to CPU”
time=2026-01-08T21:53:23.165-05:00 level=INFO source=ggml.go:494 msg=“offloaded 0/37 layers to GPU”
time=2026-01-08T21:53:23.166-05:00 level=INFO source=device.go:245 msg=“model weights” device=CPU size=“5.7 GiB”
time=2026-01-08T21:53:23.166-05:00 level=INFO source=device.go:256 msg=“kv cache” device=CPU size=“288.0 MiB”
time=2026-01-08T21:53:23.166-05:00 level=INFO source=device.go:267 msg=“compute graph” device=CPU size=“419.4 MiB”
time=2026-01-08T21:53:23.166-05:00 level=INFO source=device.go:272 msg=“total memory” size=“6.4 GiB”
time=2026-01-08T21:53:23.166-05:00 level=INFO source=sched.go:517 msg=“loaded runners” count=1
time=2026-01-08T21:53:23.166-05:00 level=INFO source=server.go:1338 msg=“waiting for llama runner to start responding”
time=2026-01-08T21:53:23.167-05:00 level=INFO source=server.go:1372 msg=“waiting for server to become available” status=“llm server loading model”
time=2026-01-08T21:53:34.932-05:00 level=INFO source=server.go:1376 msg=“llama runner started in 13.05 seconds”
[GIN] 2026/01/08 - 21:54:21 | 500 | 1m0s | 192.168.1.229 | POST “/api/chat”
[GIN] 2026/01/08 - 21:55:07 | 500 | 1m0s | 192.168.1.229 | POST “/api/chat”
[GIN] 2026/01/08 - 21:58:23 | 500 | 1m0s | 192.168.1.229 | POST “/api/chat”
[GIN] 2026/01/08 - 22:02:59 | 500 | 1m0s | 192.168.1.229 | POST “/api/chat”
[GIN] 2026/01/08 - 22:22:27 | 200 | 31.4971ms | 192.168.1.229 | GET “/api/tags”
time=2026-01-08T22:23:50.269-05:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-01-08T22:23:50.269-05:00 level=INFO source=cpu_windows.go:195 msg=“” package=0 cores=4 efficiency=0 threads=8
time=2026-01-08T22:23:50.385-05:00 level=INFO source=server.go:245 msg=“enabling flash attention”
time=2026-01-08T22:23:50.388-05:00 level=INFO source=server.go:429 msg=“starting runner” cmd=“C:\Users\tung\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --model C:\Users\tung\.ollama\models\blobs\sha256-ed12a4674d727a74ac4816c906094ea9d3119fbea46ca93288c3ce4ffbe38c55 --port 51316”
time=2026-01-08T22:23:50.391-05:00 level=INFO source=sched.go:443 msg=“system memory” total=“31.9 GiB” free=“22.2 GiB” free_swap=“24.4 GiB”
time=2026-01-08T22:23:50.391-05:00 level=INFO source=server.go:746 msg=“loading model” “model layers”=37 requested=-1
time=2026-01-08T22:23:50.444-05:00 level=INFO source=runner.go:1405 msg=“starting ollama engine”
time=2026-01-08T22:23:50.445-05:00 level=INFO source=runner.go:1440 msg=“Server listening on 127.0.0.1:51316”
time=2026-01-08T22:23:50.449-05:00 level=INFO source=runner.go:1278 msg=load request=“{Operation:fit LoraPath: Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:2048 KvCacheType: NumThreads:4 GPULayers: MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}”
time=2026-01-08T22:23:50.494-05:00 level=INFO source=ggml.go:136 msg=“” architecture=qwen3vl file_type=Q4_K_M name=“” description=“” num_tensors=858 num_key_values=40
load_backend: loaded CPU backend from C:\Users\tung\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2026-01-08T22:23:50.512-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2026-01-08T22:23:50.930-05:00 level=INFO source=runner.go:1278 msg=load request=“{Operation:alloc LoraPath: Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:2048 KvCacheType: NumThreads:4 GPULayers: MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}”
time=2026-01-08T22:23:51.404-05:00 level=INFO source=runner.go:1278 msg=load request=“{Operation:commit LoraPath: Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:2048 KvCacheType: NumThreads:4 GPULayers: MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}”
time=2026-01-08T22:23:51.405-05:00 level=INFO source=ggml.go:482 msg=“offloading 0 repeating layers to GPU”
time=2026-01-08T22:23:51.405-05:00 level=INFO source=ggml.go:486 msg=“offloading output layer to CPU”
time=2026-01-08T22:23:51.405-05:00 level=INFO source=ggml.go:494 msg=“offloaded 0/37 layers to GPU”
time=2026-01-08T22:23:51.405-05:00 level=INFO source=device.go:245 msg=“model weights” device=CPU size=“5.7 GiB”
time=2026-01-08T22:23:51.406-05:00 level=INFO source=device.go:256 msg=“kv cache” device=CPU size=“288.0 MiB”
time=2026-01-08T22:23:51.406-05:00 level=INFO source=device.go:267 msg=“compute graph” device=CPU size=“419.4 MiB”
time=2026-01-08T22:23:51.406-05:00 level=INFO source=device.go:272 msg=“total memory” size=“6.4 GiB”
time=2026-01-08T22:23:51.406-05:00 level=INFO source=sched.go:517 msg=“loaded runners” count=1
time=2026-01-08T22:23:51.407-05:00 level=INFO source=server.go:1338 msg=“waiting for llama runner to start responding”
time=2026-01-08T22:23:51.416-05:00 level=INFO source=server.go:1372 msg=“waiting for server to become available” status=“llm server loading model”
time=2026-01-08T22:23:52.687-05:00 level=INFO source=server.go:1376 msg=“llama runner started in 2.30 seconds”
[GIN] 2026/01/08 - 22:24:50 | 500 | 1m0s | 192.168.1.229 | POST “/api/chat”

1 Like

i believe it’s a timing out issue since my Ollama server is on an i7 (7th gen).
i added LLM Vision through the UI. Claude and cgpt suggests i change the timeout in the config.yml as seen here

llm:
  - name: "Ollama Vision"
    provider: ollama
    model: "llava:latest"  # or your vision model
    base_url: "http://localhost:11434"
    timeout: 120  # timeout in seconds (default is typically 30)

can i just add that yml code into my config file even though i already have it in the UI?
there is absolutely no way to change timeout value in UI here

This is why. Without a GPU you are done. I’m suprised you got anything to load. (If you actually did)

happy to say it is working, not perfect, but making progress than before!
i installed on a machine with a 3060 gpu and this is the response i got with
qwen3-vl:8b model

the value should be 90 but for some reason, Qwen gives 60.
is there any better model i should try?

i dont know if it matters but once i change these 2 fields, i got correct answers for all the gauges in the house. WOW