Voice Assist using an LLM but with RAG

syserr · December 30, 2025, 1:04am

It’s been great using Voice Assist with the official Ollama integration. I have everything running locally and a used RTX 3060 + llama3.1:8b works well for me.

I’ve been doing a lot of reading to try to find a way that I can utilize RAG (Retrieval Augmented Generation) on the back end. I thought I found it. Instead of using the Ollama integration – I could use Home-LLM (HACS) and connect it to OpenWebUI. It appears OpenWebUI exposes it’s LLMs via Ollama protocol if you include /ollama on the end of the URI.

My idea, I’d have some basic RAG stuff in OpenWebUI that would wrap around Ollama/llama. Well, part of this works. The Home-LLM integration connects to OpenWebUI and can access “real” Ollama models, but really doesn’t want to use a “workspace model.”

I made a workspace and have some text documents for augmentation. When I point Home-LLM at my workspace model “house” I get this error on first query:

Sorry, there was a problem talking to the backend: HomeAssistantError(‘Failed to communicate with the API! {“detail”:“Model ‘house:latest’ was not found”} (status code: 400)’)

I may try something with AnythingLLM.

My question – does anyone have this sort of things working? If so, how did you configure it?

Thanks,
Scott