I have Home Assistant running on an HP EliteDesk 800 G2 i5 6500 with 16gb ram. I have a VPE using ChatGPT 4.1 mini in the cloud. Performance is not very good at all. I am hoping the community can help me get this running a bit quicker with out building an AI server. I do plan on building an AI server but at the moment the funding is not available as I am laid off. I included some screen shots of my assist pipeline. I am not tech god but I am not a complete noob.
If I’m reading that right. You’re using faster-whisper and Piper both locally for sst and TTS. The. You’re using Gpt4.1 mini. Your prompt. We’ll it needs work but that won’t have anything to do with speed.
Lemme guess 6-12 seconds?
Thats as fast as you’re gonna get without starting to invest in some better gear. Sorry you can try different cloud providers. Thyraz says his gets turned around in a few seconds. But with cloud LLM and that voice setup you’re already pretty close to as fast as you’re going to get.
You don’t show if you’re local processing first that will make basic intent processing faster for lights and things but llm response not local assume 6+ seconds to round trip. Anything better. Gravy.
Yea at the higher end of that.
OpenAI and also many other LLM companies aren’t optimized for low latency and fast time-to-first-token responses.
Here’s what I use and Nathan referred to.
Take a look here for some comparison response times to gpt-4.1-mini:
The best combination of a really fast provider, with a good tool calling model that has a large context size AND is affordable I could find is:
Model: gtp-oss-120b
Provider/LLM-hoster: http://groq.com
It provides an Open-AI compatible API endpoint and can therefore be used with this integration:
Easy to setup, as the configuration of this LLM integration is based on the official HA OpenAI integration, just with a few additional fields.
In Groq you need to register an account, add a payment method and create an API key.
It’s billed by usage afterwards (and you can set limits), so you don’t need to pay a larger amount pre-paid to test.
Thank you I will look into that when I get home.
So I have the LLM integration install and configured properly. I think. I have my Groq account setup with my API key and payment set up. Somethin somewhere is not configured correctly and I am hoping you can help me out please. It can tell me the time but much more than that it fails and the VPE flashes red. I have a whole new assist pipeline set up. I am just not sure where I went wrong.
Ok, so let’s take a look what might go wrong.
First, on which HA version are you?
I’m still on 2025.12.3, as there have been problems with LLMs regarding tool calls.
Your problem could be related to that in case you’re running a newer version.
In that case I think I’ve seen that there’s hope that the upcoming 2026.1.2 will fix this.
Beside of that I noticed that there have been a few updates to the LLM integration due to problems as well. Maybe related to the HA 2026.1.x versions.
Here I’m currently on version 1.2.3.
In HACS it’s possible to click on the 3-dot-menu when you selected/opened an integration and select to download it again.
There’s the option to choose which version.
Might be worth a try to use the same one as I do, in case the newest versions are problematic currently due to the tries to fix things on the latest HA version.
This is the the config of the LLM integration itself:
And this are the settings of the conversation agent created under the llm integration:
I use a template for the prompt, so I can edit it in one place for all the LLMs i tested.
But you can also enter your prompt directly here.
I am not sure what you mean by you make your prompt with a template either.
A LLM prompt allows to use templates to insert dynamic content to it.
I use just a single large template text as prompt, where all my text is coming from.
This doesn’t make a difference here, it allows me just to have a single place to edit the prompt for all my voice assistants (e.g. Open-AI Chat-GPT, Groq gpt-oss-120b, OpenRouter, … and what else I tried in parallel for testing).
So I don’t have to edit it in each one when I make changes.
The only differences I see in your screenshot is parallel tool calling which I haven’t enabled.
I also update to version 1.2.6 of the LLM integration just now to see if this might be the cause.
But it’s still working on my end.
Beside that you’re on 2026.1.1 and I’m not.
As the family is all around the house at the moment I can’t update HA.
Will do a backup of the HA VM tomorrow and also update to 2026.1.1 to see if this might cause an issue.
Thank you for the help. I really appreciate it. Thanks for the knowledge as well. I will try disabling parallel tool calling.











