HASSOS is in a VM hosted on proxmox with only an intel iGPU that might be able to be passed through.
I moved whisper and ollama off to a machine with an eGPU (2080ti) connected. This is running a medium-int8 whisper model (i would like to try large but the linuxserver.io image doesn’t support that, and it uses beam 5).
Is this reasonable speed? (its certainly much faster than it all running in VM on cpu).
It would seem i am at law of diminishing returns on the speech to text, but would be cool to get it faster if possible.
Any suggestions on how to get the NLP with ollama down to under a second? (this is currently with 55 exposed objects)
The same idea, gpt4o mini takes about two seconds, and my GPU also takes about two seconds (3060). Normally, openai devices are like rockets, while I am like a snail. But there is no difference in time.
Yes, because OpenAI has countless Tesla GPUs. The model parameters have reached billions. This is something our RTX cannot compare to. But both require about 2 seconds of time. I don’t know where the problem lies
does anyone know if it is possible to pass the --verbose argument to the runner when it runs the model, this would provide additional inference statsitics…