I’m using a Satellite 1 with a custom wakeword and OpenAI gpt-5-mini for my LLM. It took over 10 seconds before I head a reply here. What am I doing wrong? Or… what additional info should I be sharing?
Voice responses are sent as url to file
The esp logs will show the link and you can copy/paste/play file link on PC to verify that the weblink is accessible and valid and file works
But it is working. It just takes a long time. Maybe that’s to be expected?
8-10 secondisnis jot unheard if with cloud models…
What are you using for SST and TTS
fast_whisper and piper, but there’s almost no delay for them.
Then you are probably as fast as you’re going to get look at the voice debug and count how long each step takes… Yes you will see a multinsecond lag from cloud services.
How many entities do you have exposed? If you are using openAI API, either your system/network is slow or you are sending a lot of data.
One approach id to use two different wake words. One path is to a bigger LLM that doesn’t control HA. The other controls HA and you choose the LLM that is fast and smart enough.
I prefer smart, and my LLM response time is usually in the 4-6 second range for both local and cloud LLM.
I assume that paid cloud LLM is more reliably fast compared to free.
You would assume incorrectly - this is purely a physics/math problem. You pay for privacy not speed.
I ONLY use paid cloud models when using cloud. 5-8 seconds is normal on a standard ha assist inference of any decent size.
If you want faster you’re starting to optimize things like changing to local inference., local voice etc. the voice debug will te you what to optimize
Use voice debug. Make a list. Then fill in the blanks.
Time from call to TTS output
Time from TTS output to start of inference
Time to voice from end of inference
Add em up. (longest will be inference and it will be likely more than four seconds probably closer to 8.)
Attack the biggest number.
Until you do that we’re guessing.
When a cloud providers nears capacity they are going to throttle the unpaid users first
That’s possible sure. But I can tell you I’ve not seen it as a matter of course except Gemini.
My strong recommendation is not to count pay as speed pay is privacy. Because that’s how the cloud vendors see it.
They’d dump free entirely if they could… Trust me.
But because free llm is nothing more than a training vac for the vendors I strongly recommend people don’t use free. I value my privacy.
As usual you change the topic. The OP wrote nothing about privacy. I gave the OP reasons a cloud LLM provider response time might be slow. Obviously the OP sees each step as he posted the debug
Not changing topic. See previous steps to understand EXACTLY WHICH STEP is the problem.
When op does that they can come back and we can explore which it is.
Until THEN we are guessing and Im not going to throw spaghetti at the wall.
Cloud llm EVEN PAID takes 5-8 seconds with an average context size. It scales with the amount of data put IN
Op is asking why they’re slow.
My point is pvt v paid DOES NOT MAKE ENOUGH DIFFERENCE for it to matter here in almost all cases so don’t bother chasing it. You can have slow paid too…
So first step is find what part of the pipelines slow. THEN ACT. Not guess. It’s a math problem. The info is inthe voice debug. Count seconds, attack longest run. Period.
First, look at the information from the Raw block and analyze the timings (or copy the log to LLM for analysis). You’ll immediately know where the problem lies.
Visual blocks display very specific values:
stt - the time between detecting the end of speech and receiving the text result. This may not reflect problems at the audio capture stage.
nlp - the time to process the intent/receive all tokens from LLM
tts - not working, simply indicates that the text was transferred to synthesis immediately. Normally, for long responses, speech synthesis should begin at the nlp stage.
I posted the debug pic. Can you not see it or am I missing something?
edit: And fwiw, I of course pay for OpenAI. I don’t even think they offer a free option.
And you’re saying that’s normal? I mean, its fine if it is. Not ideal, but I’ll live. I just want to make sure I’m not doing something wrong.
It’s normal…

