I’ve been tearing my hair out over this and could really use some advice if possible.
I have Voice Assistant: Preview Edition setup with the Ollama integration. my server only has an iGPU, so I’m using a tiny model (llama 3.2:1b). I have ensured the model is not cold-starting & has the model pre-loaded, prefers handling messages locally, and has the minimum context size possible (2048, with 0 message history). I have also ensured the model is not trying to control Assist.
On a fresh reload of wyoming/VA integrations, a simple “What is your name?” command takes 1 second.
I’ll then ask it 10 commands, asking it to turn on/off various lights. When a light doesn’t exist, and the command is sent to the LLM, I get strange JSON responses back, which are voiced back to me. For example:
{“name”: “HassLightSet”, “parameters”: {“brightness”:50,“name”:“Completely fake lights”}}
(note: this only happens when “prefer handling commands locally” is enabled. otherwise, I get sane responses back.)
After that, I’ll try to ask it “What is your name?” again. This time, it takes 6 seconds.
If I don’t reload my integrations again, this delay will steadily increase. It’s gotten to 20 seconds before, at which point I felt like breaking it.
Other things I’ve ruled out:
- Ollama being slow (if I copy paste my prompt directly to Ollama, the response is lightning fast)
- Slowness with STT/TTS (Debug runs without the preview edition are lightning fast)
Here is a debug log of the slow run:
stage: done
run:
pipeline: 7fP4q0xRkM8Z2nLJ9aBvQeYt
language: en
conversation_id: 8QZpF6k1YJrW0D9mS2HnA5tL
satellite_id: assist_satellite.voice_unit_7cD1A9xQeM
tts_output:
token: R9QwXJ6A1bZcL4TnY8P2Vd.flac
url: /api/tts_proxy/R9QwXJ6A1bZcL4TnY8P2Vd.flac
mime_type: audio/flac
stream_response: true
events:
- type: run-start
data:
pipeline: 7fP4q0xRkM8Z2nLJ9aBvQeYt
language: en
conversation_id: 8QZpF6k1YJrW0D9mS2HnA5tL
satellite_id: assist_satellite.voice_unit_7cD1A9xQeM
tts_output:
token: R9QwXJ6A1bZcL4TnY8P2Vd.flac
url: /api/tts_proxy/R9QwXJ6A1bZcL4TnY8P2Vd.flac
mime_type: audio/flac
stream_response: true
timestamp: "2026-02-12T18:42:00.092965+00:00"
- type: stt-start
data:
engine: stt.faster_whisper
metadata:
language: en
format: wav
codec: pcm
bit_rate: 16
sample_rate: 16000
channel: 1
timestamp: "2026-02-12T18:42:00.093165+00:00"
- type: stt-vad-start
data:
timestamp: 1050
timestamp: "2026-02-12T18:42:01.159069+00:00"
- type: stt-vad-end
data:
timestamp: 2690
timestamp: "2026-02-12T18:42:02.782167+00:00"
- type: stt-end
data:
stt_output:
text: " What is your name?"
timestamp: "2026-02-12T18:42:02.963807+00:00"
- type: intent-start
data:
engine: conversation.ollama_conversation
language: en
intent_input: " What is your name?"
conversation_id: 8QZpF6k1YJrW0D9mS2HnA5tL
device_id: D4A9n8k2WZxPq7m0FJ6L1S5C
satellite_id: assist_satellite.voice_unit_7cD1A9xQeM
prefer_local_intents: true
timestamp: "2026-02-12T18:42:02.964065+00:00"
- type: intent-progress
data:
chat_log_delta:
role: assistant
content: I
timestamp: "2026-02-12T18:42:08.388485+00:00"
- type: intent-progress
data:
chat_log_delta:
content: " am"
timestamp: "2026-02-12T18:42:08.421001+00:00"
- type: intent-progress
data:
chat_log_delta:
content: " Homer"
timestamp: "2026-02-12T18:42:08.449793+00:00"
- type: intent-progress
data:
chat_log_delta:
content: ","
timestamp: "2026-02-12T18:42:08.483069+00:00"
- type: intent-progress
data:
chat_log_delta:
content: " a"
timestamp: "2026-02-12T18:42:08.512579+00:00"
- type: intent-progress
data:
chat_log_delta:
content: " Home"
timestamp: "2026-02-12T18:42:08.542226+00:00"
- type: intent-progress
data:
chat_log_delta:
content: " Assistant"
timestamp: "2026-02-12T18:42:08.573931+00:00"
- type: intent-progress
data:
chat_log_delta:
content: "."
timestamp: "2026-02-12T18:42:08.602148+00:00"
- type: intent-progress
data:
chat_log_delta:
content: ""
timestamp: "2026-02-12T18:42:08.631625+00:00"
- type: intent-end
data:
processed_locally: false
intent_output:
response:
speech:
plain:
speech: I am Homer, a Home Assistant.
extra_data: null
card: {}
language: en
response_type: action_done
data:
targets: []
success: []
failed: []
conversation_id: 8QZpF6k1YJrW0D9mS2HnA5tL
continue_conversation: false
timestamp: "2026-02-12T18:42:08.632073+00:00"
- type: tts-start
data:
engine: tts.piper
language: en_GB
voice: en_GB-jenny_dioco-medium
tts_input: I am Homer, a Home Assistant.
acknowledge_override: false
timestamp: "2026-02-12T18:42:08.632243+00:00"
- type: tts-end
data:
tts_output:
media_id: media-source://tts/-stream-/R9QwXJ6A1bZcL4TnY8P2Vd.flac
token: R9QwXJ6A1bZcL4TnY8P2Vd.flac
url: /api/tts_proxy/R9QwXJ6A1bZcL4TnY8P2Vd.flac
mime_type: audio/flac
timestamp: "2026-02-12T18:42:08.632690+00:00"
- type: run-end
data: null
timestamp: "2026-02-12T18:42:08.632908+00:00"
started: 2026-02-12T18:42:00.092Z
finished: 2026-02-12T18:42:08.632Z
The big delay is in receiving a first token back from Ollama. But since I’ve already proven Ollama is running fine, and that the first command is also lightning fast - I think HA is filling additional context and/or a history of messages, even though I’ve told it not to.
Any help would truly truly be appreciated. For now, I’m giving up on this altogether.
