No, but the good news is, a GTX 1660 ti works and are about $100 cad used. Won’t do LLMs, but is good enough for this.
Yes, but the bad news is that my server is an intel NUC, so adding a GPU is not an option.
I was hoping for external m.2 TPU accelerators like Hailo 8 or similar boards would become popular enough.
Can always x1 PCIe lane to external enclosure. Depends how bad you want it.
I don’t know yet. I also care for power consumption a lot. My server sips ~6-9w when mostly-idle (which is 98% of the time for a home server). I can imagine adding an nvidia GPU would easily 5x that number.
Again, depends how bad you want it.
Sometimes, you have to have less than optimal set ups to accomplish bleeding edge tech
If you want to wait for power efficient, edge processors that have full support with whatever stack you want to use, that’s fine.
You asked specifically about if it were possible l, today, without a GPU. I simply gave you the information you requested. Hailo 8 doesn’t seem supported for this use case, but I could be wrong (memory will be the biggest issue)
And I appreciate it. It is a shame that Xe intel iGPUs are not supported. They are fairly decent actually. Maybe there are developments in the future. I’ve seen some info about a pytorch extensions with HW acceleration for intel xe graphics.
Actually, it won’t increase a lot of energy consumption because the GPU doesn’t work most of the time and only works when you’re talking to your assistant. The standby power of GPU is approximately 6W-9W. If you choose a GPU with 12GB VRAM, you can allow high-quality STT and LLM locally. 3060 is the best choice because it is very affordable and has a large number of VRAMs available.
So - I’m about to start my journey into Whisper runnning with GPU and experience the speed you guys refer to
First:
I’ve got a server with an Intel i9, 64 Gb memory, 2 TB M.2.
Running Ubuntu 22.04 LTS and already run couple of docker containers.
Purchased a second hand Nvidia A2000 that should be here in couple of days.
@baudneo @Fraddles @alienatedsec
What is the latest route to get up and running? I need to get the Nvidia (@alienatedsec I guess your URL should work) drivers in and then the correct whisper container.
I hope my hardware combined with ubuntu 22.04 should be ok?
Finally, there is a Whisper model on Huggingface I want to get added as it is a pretty solid model for Norwegian supporting wide range of dialects (don’t even try to lean Norwegian): NbAiLab/nb-whisper-large · Hugging Face
Can this be used?
The docker setup is simple as long as you have the host drivers installed and Nvidia container toolkit.
That model, I don’t think will work. I may be wrong but when I was setting my stuff up, I could only use the models that rhasspy supplied. I think the best, at the time, was a medium int8 model.
Idk if there is the ability to load in whatever models, you’ll need to experiment. I did implement the ability to load whatever models, but there was an issue. Can’t recall, but it wasn’t worth my time at that point as the supplied model was, and still is, performing well for English.
Your hardware should be more than enough for the models supplied by rhasspy. If you can load other models, you’ll need to experiment
Thanks @baudneo This image, is this recommended (image: lscr.io/linuxserver/faster-whisper:gpu
) ? And, are there now any limitations to what version of Nvidia I do install as I see they now support Ubuntu 22.04 (you write quite earlier do not use > v11.x ?
I personally use a modified version of @edurenye repo.
IIRC, the comment about cuda had something to do with building piper and onnxruntime gpu. You can just try it out and see if you get any errors. If you get weird errors, switch container tags to a newer or older version of CUDA/cudnn.
There shouldn’t be issues though, it’s fairly straightforward in these containers as they’re built for this purpose and to be user friendly.
@baudneo Did you ever get to test using a local model for Whisper add-on? I can’t find any information about this anywhere.
I did but there were errors. I didn’t spend much time on it due to my needs being met already. I’ll take a look and point you in right direction, I based my work off of a closed pull request in one of the repos. Might of been faster-whisper but the rhasspy version?
I successfully converted the model I referred to on Huggingface, and loaded it into the Whisper add-on for testing within HA and works like a charm! I expect to receive my GPU today or tomorrow and will then load the Large model to my Ubuntu.
Still, I am a bit unsure what docker image to use for my local faster-whisper…
@alienatedsec @baudneo It is running reaøøy smooth! Thanks to you both! I use a Nvidia RTX A4000, and even have a spare I can add if needed for future projects
Question: Any of you tried to train voices (Piper) using the GPU setup ?
No training, as I haven’t had a use case for it. So can’t help much there at all.
What steps did you take to convert a HF model for use with faster whisper?
Congrats on getting things to work with your native language!
Did run a conversion script
You wrote it yourself? Or did you use a guide? Any information on it?
Here it is:
You will need Python3.
Install pip install transformers>=4.35.2
Then run:
ct2-transformers-converter --model XYZ --output_dir <YOUR DIR> --copy_files tokenizer.json preprocessor_config.json --quantization float16
If the model you want to convert is at Huggingface, simpy give the path, like this:
--model openai/whisper-large-v3