You don’t say where you load those. But making some educated guesses…
While I don’t know where you’re running whisper… That 2 second sst time tells me it’s probably Cpu local. For that hw acceleration can pull it subsecond.
Oai… Cloud. Eleven labs, cloud. I’m not sure if the way you put them together support streaming. For this one… Go local you have 2 round trips t the cloud. And if you have to pick one… While elevenlabs voices sound great. It slows my voice pipe to a crawl.
Probably a combination of model thinking, cloud lag + not streaming the responses but you’ll have ton experiment to figure it out.
Go to settings > voice assistant > pick your assistant and hit the three dot menu to the right and pick debug to get details for every step.
Hi Nathan, yes that makes sense, but I need the quality, otherwise it’s useless and I can rather pull out my iPhone and open ChatGPT. My question is: why do the logs show one number for the seconds, and using my stopwatch is 2X longer than what the logs show? Because this way it’s hard to debug, as I’m losing 7s somewhere invisible in the logs. Thx, David
That responsiveness costs money. To eliminate lag.
local
beefy npu/GPU on the gear doing the work.
Theres no way around it.
Also when I run Friday with a cloud agent Ive never had less than a 5 second turn around. (using oai speech models that I know stream) to break that I’ll need 100% local.
Makes sense! I thought that that lag would appear in the logs. GPT told me that there’s a reported lag of the Voice PE where it waits to start playing the TTS, but I guess that’s not it.
I have the HA Green, so that’s all I’ve got in terms of hardware. I’ll play around with it though, thanks