Conversation agents, HA vs Deepseek (or others)

In the voice pipeline model, the conversation agent is labelled as:

“The conversation agent is the brains of your assistant and will process the incoming text commands.”

Up until now, I’ve kept this set to the default ‘Home Assistant’.

I also run the faster-whisper addon for the STT element.

Recently, I was playing around with Ollama and got Deepseek up and running, and used this as the conversation agent instead.

Given that faster-whisper seems to be doing the heavy lifting on STT, what does the default HA or the Ollama (in this case deepseek) actually do ?

At least on my relatively modest tiny pc, most of the processing (and delay) seems to be in the STT part, especially as you trial larger models.

And one last question, the docs for the Ollama integration says it does not support template sentences, but I’ve tried it and it appears to work fine with custom sentences I’ve used to start automations.

1 Like

There are a few things going on here.

First the TTS (whisper on this case)
This disassembles the audio and returns a best approximation of translation to language. It’s pretty lightweight and can easily be run locally. This is skipped entirely (as is STT) if you are typing in a char window like in the mobile client.

This is passed to whatever you’re using for recognition. This is where things can turn. You are either doing this ‘locally’ according to HA which just means handled by the HA speech intent recognition OR not locally - by the LLM.

For local recognition to happen it passed the sentence to the HA recognition and if it matches an intent it funnels it off to that intent. Done. Ha does its thing and returns a response. See STT later… Note at this stage the HA parser is formulaic. If you don’t specify what it expects in your sentences or if you didn’t catch an idiosyncratic form. It won’t match… It doesn’t figure stuff out.

You can also setup fail over or fallback. Where if the HA recognizer doesn’t match an intent it falls back to whatever llm is setup for that pipeline. It’s chucked at the LLM

If you either chose to skip local or fallback, your request is now with the llm. This is where LLM has a distinct advantage. It does pattern recognition of what was fed in and tries to find a ‘tool’ (those same scripts and intents local sees) and match up a call. (this is why if you choose to use an llm you should choose one optimized for ‘tool use’, see also why I don’t understand any LLMs attached to HA that can’t ‘operate HA.’ I also recommend a straight LLM without chain of thought or reasoning at this moment [gpt4.o-mini not o.1, etc.] but that’s another post.) It does NOT need to be an exact string match the LLM can figure out what you meant. LLMs are EXCEEDINGLY good at this part. So you often get WAY better recognition of phrases and ‘what I meant’. IF you do a good job of describing the conditions and rules to the AI. More on that in another post.

Builtin - pattern match only, is lightweight and runs on anything you can run HA on. Read: gets the job done but is very VERY unforgiving.

LLMs - WAY better result pattern matching because it can infer intent, has heavier requirements, either a service (chatgpt etc.) or setting up your own local LLM infrastructure (ollama+open-webui+open router is common) read: time, energy, money for better results.

If the LLM finds a match it assembles the correct JSON to set off the intent and throws it at the intent recognizer as a tool call. Essentially back in the same place I the pipeline you would have been with a local only call but it did a better job at matching. HA does it’s thing and then sends back a response.

In LLM land that response is interpreted and sent to STT. It will be a summary of the response (and where things get creative) whereas with no LLM you get the verbatim response. Your LLMs impact is felt here the most because it has control of what’s ultimately fed to be voice or your readout… It’s also able to use any data in the conversation pipeline (think continuation) or the initial prompt

That response is fed to your STT engine (probably Piper or the web STT engine) and back to you.

Flow is basically:
STT/Text > recognizer (local/llm) > tool > response > (llm option) > TTS/Text

7 Likes

Hey @NathanCu i have read your explanation and it is perfect and makes lot of sense to me, i just have one question is there any way i can use fallback so that when the query is not processed to let’s say openai integration, it passes it to extended one as each seems to do something better than the other after i experimented both a lot (using cloud)
i found a fallback GitHub repository GitHub - m50/ha-fallback-conversation: HomeAssistant Assist Fallback Conversation Agent but it was archived this past Dec. so i have no idea if i can achieve something like that if possible, do you have an idea ?

Let me read how they’re doing thier thing before I answer.

IMHO my holy grail is using an ollama integration to have a user defined endpoint pointed at a local ollama (probably a qwen2.5 or phi4 distill of some breed) accessed through openwebui instance that can use tools and multi agent calls. Have that lashed to open router which ultimately has access to chatgpt and huggingface. Then let the local installation answer most calls then push out to a paid llm for horsepower. Then the local/fail over becomes moot. Push everything. Let the local llm sort it out.

1 Like

OK, my answer
2025.2 Beta: Iterating on backups - Home Assistant

Specifically
Shared history between the default conversation agent and its LLM-based fallback

and

Model Context Protocol

Model Context Protocol is here. Users can now integrate Home Assistant into their AI tools that support MCP, and integrate MCP servers as tools in Home Assistant.

Together make the question a moot point.

Basically what these mean in combination is you’ll be able to handle local attempt - and if no cast to LLM without losing context as they describe (one of the last major deficiencies they had there IMHO) THEN we’re now enabling MCP tools in and out (TO/FROM) home assistant.

Still working out the complexities of what lives where in this scenario but your takeaway - more flexibility. Even more options. Less lockin.

1 Like

Well actually that is exactly what i need … still i would prefer to switch between the two openai integrations as each one of them has complemented some parts the other can not but i will wait to see if that is possible … still that idea is nearly what i need indeed

Id research the ollama integration or any ollama compatible) Then look at pointing it at an ollama compatible API - endpoint.

Then you build one of these:
Ollama
Front ended by:
Open WebUI
And load er up with tools - and honestly you’ll forget about whether it’s local or no. Because OpenwebUi can then do that brokering. (I just have not been successful in using the built-in against it’s OpenAI API endpoint because it’s hardcoded and I haven’t had the time to CLONE the builtin integration to make the ONE EDIT it would take to let me do that…)

But I think you see how that all starts to fit together and being local you wouldn’t care if it were handled by HA first or no. It should also be very doable on a mini pc with a battlemage or a minipc with a gpu.

1 Like

A valid point and idea … changing local to ollama will provide me with a decent openai like assistant instead of the HA while making one of the openai integrations the other one is my best choice … all i need now is to check a mini pc that can host both HA and one good ollama integration with tools functionality annd as you mentioned gpu instead of my raspberry pi … and maybe i can use my raspberry as a Wyoming satellite to increase my Alexa-like voice assistant at home … Thanks Nathan

1 Like