Hey everyone. This may be expected but it has seemed odd to me and I think I may have done something wrong. I recently setup a whisper/piper audio pipeline using the Wyoming protocol integration and found that the latency was very noticeable when talking to the assistant. At first I thought this was just underpowered hardware until I went in and saw that the voice assistant debug panel was only recording 0.1 seconds on average for the voice transcription. After more testing what I eventually found was the latency was correlated to how long the audio clip was. If I send quick silence then its almost instant. But a 2 second clip has much more noticeable latency (5 - 8 seconds). My thought is that this has something to do with buffering the audio file before sending? Is there any way to improve this piece of it?
The overview of my setup is I have 2 nixos hosts as Proxmox VMs. One running HA in podman with nginx as the proxy for SSL. And then a second nixos host runs my ‘AI’ stuff with a gtx 1080 passed through which is where whisper is running also in podman with nvidia container toolkit. All this connected over tailscale. I’ve tested bypassing nginx and tailscale by hitting the tailscale DNS name directly and also my local ip address directly and all ways the latency is there. Another point that makes me think the transcription itself is not the issue is I can tail the whisper logs and the ‘processing’ and ‘done’ logs for whisper are almost imperceptibly close to each other.
Happy to answer any follow up questions for config examples. I really don’t modify HA much beyond its defaults except for I use a postgres instance for my recorder integration. Pretty much everything else in configuration.yaml is untouched.