I am running Voice-PE with OpenAI with “prefer locally” option. I have about 70 entities exposed.
Things like turning a plug or light on or off work really fast.
Calling an intent is “OK”-ish.
Starting a timer, or adding something to the shopping-list is a pain. It takes an eternity to get the result.
I have tried switching to the Home Assistant Cloud (instead of OpenAI). I didnt sense a lot of speed improvement but adding to my shoppinglist didnt work anymore.
HA is currently running on a HyperV VM, mnoving that (temporarily) to my power desktop machine didnt change any speeds. I am not even sure that moving it from HA green changed a lot.
Is there any tool (except the debug logger where I fail to find my data in all the stuff thats going on) to measure a single command? Where is it taking so long? Understanding the voice? Interpreting the command? Giving feedback?
Any help would be greatly appriciated. Home Acceptance Factor is very low at the moment.
Sorry you didn’t get any reply to this - I find it tends to happen when you ask questions about Assist, which is a bit of a black box.
It’s very hard to find much information about what kind of response speeds to expect for any voice assistant, even for Alexa and Google Assistant. Reviews tend to focus on accuracy and feature comparisons.
This is understandable because there are a number of steps between wake word and response, any one of which could affect latency:
Wake word detection
End of speech detection
Speech to text
Natural language processing
Action
Text to speech response
TTS in particular can introduce varying delays because typically the whole response has to be compiled before the voice assistant starts speaking.
As you say, the only number that Home Assistant provides is the Natural Language Processing figure in Assist debug:.
But this only represents time taken by Assist to process the text provided by speech-to-text and interpret its meaning. It doesn’t include the time taken by STT itself or by any TTS response. It is not available programmatically, so its use as a debug tool is limited.
My experience has been that you can get some improvement by tuning each stage of the pipeline. You might experiment with TTS from Elevenlabs, for example, which is supposed to be significantly faster than most. A different LLM might make a difference too.
When it comes to timers, my understanding is that these are handled by Voice-PE itself, which may be under powered. You could experiment with custom intents and put the timer on HA to see if that speeds things up (as an added bonus, you can show progress on a dashboard). The same might apply to shopping lists.
I don’t have a Voice PE, but I believe there are some some configuration options;
My screenshot fits best to this title …
From time to time I have a VEEERY LOONG response time … usually it’s faster, like … ~2…5 seconds for NLP, but … this was the record: