Run whisper on external server

baudneo · September 26, 2024, 10:06pm

No, but the good news is, a GTX 1660 ti works and are about $100 cad used. Won’t do LLMs, but is good enough for this.

cibernox · September 26, 2024, 10:23pm

Yes, but the bad news is that my server is an intel NUC, so adding a GPU is not an option.
I was hoping for external m.2 TPU accelerators like Hailo 8 or similar boards would become popular enough.

baudneo · September 26, 2024, 10:24pm

Can always x1 PCIe lane to external enclosure. Depends how bad you want it.

cibernox · September 26, 2024, 10:30pm

I don’t know yet. I also care for power consumption a lot. My server sips ~6-9w when mostly-idle (which is 98% of the time for a home server). I can imagine adding an nvidia GPU would easily 5x that number.

baudneo · September 26, 2024, 10:40pm

Again, depends how bad you want it.

Sometimes, you have to have less than optimal set ups to accomplish bleeding edge tech

If you want to wait for power efficient, edge processors that have full support with whatever stack you want to use, that’s fine.

You asked specifically about if it were possible l, today, without a GPU. I simply gave you the information you requested. Hailo 8 doesn’t seem supported for this use case, but I could be wrong (memory will be the biggest issue)

github.com

hailo-ai/hailo_model_zoo/blob/9e26452cd49c211bae392a6070b0741c2a73ebb9/docs/PUBLIC_MODELS.rst

Hailo provides different pre-trained models in ONNX / TF formats and pre-compiled HEF (Hailo Executable Format) binary file to execute on the Hailo devices.

.. list-table::
   :widths: 31 9 7 11 9 9
   :header-rows: 1

   * - Task Type
     - Hailo-8
     - Hailo-8L
     - Hailo-15H
     - Hailo-15M
     - Hailo-10
   * - Classification
     - `Link <public_models/HAILO8/HAILO8_classification.rst>`_
     - `Link <public_models/HAILO8L/HAILO8L_classification.rst>`_
     - `Link <public_models/HAILO15H/HAILO15H_classification.rst>`_
     - `Link <public_models/HAILO15M/HAILO15M_classification.rst>`_
     - `Link <public_models/HAILO10/HAILO10_classification.rst>`_
   * - Object Detection
     - `Link <public_models/HAILO8/HAILO8_object_detection.rst>`_

This file has been truncated. show original

cibernox · September 26, 2024, 11:43pm

And I appreciate it. It is a shame that Xe intel iGPUs are not supported. They are fairly decent actually. Maybe there are developments in the future. I’ve seen some info about a pytorch extensions with HW acceleration for intel xe graphics.

NIUB · September 27, 2024, 7:28pm

Actually, it won’t increase a lot of energy consumption because the GPU doesn’t work most of the time and only works when you’re talking to your assistant. The standby power of GPU is approximately 6W-9W. If you choose a GPU with 12GB VRAM, you can allow high-quality STT and LLM locally. 3060 is the best choice because it is very affordable and has a large number of VRAMs available.

TheStigh · October 29, 2024, 11:20pm

So - I’m about to start my journey into Whisper runnning with GPU and experience the speed you guys refer to

First:
I’ve got a server with an Intel i9, 64 Gb memory, 2 TB M.2.
Running Ubuntu 22.04 LTS and already run couple of docker containers.
Purchased a second hand Nvidia A2000 that should be here in couple of days.

@baudneo @Fraddles @alienatedsec

What is the latest route to get up and running? I need to get the Nvidia (@alienatedsec I guess your URL should work) drivers in and then the correct whisper container.

I hope my hardware combined with ubuntu 22.04 should be ok?

Finally, there is a Whisper model on Huggingface I want to get added as it is a pretty solid model for Norwegian supporting wide range of dialects (don’t even try to lean Norwegian): NbAiLab/nb-whisper-large · Hugging Face

Can this be used?

baudneo · October 31, 2024, 2:54pm

The docker setup is simple as long as you have the host drivers installed and Nvidia container toolkit.

That model, I don’t think will work. I may be wrong but when I was setting my stuff up, I could only use the models that rhasspy supplied. I think the best, at the time, was a medium int8 model.

Idk if there is the ability to load in whatever models, you’ll need to experiment. I did implement the ability to load whatever models, but there was an issue. Can’t recall, but it wasn’t worth my time at that point as the supplied model was, and still is, performing well for English.

Your hardware should be more than enough for the models supplied by rhasspy. If you can load other models, you’ll need to experiment

TheStigh · November 1, 2024, 4:09pm

Thanks @baudneo This image, is this recommended (image: lscr.io/linuxserver/faster-whisper:gpu) ? And, are there now any limitations to what version of Nvidia I do install as I see they now support Ubuntu 22.04 (you write quite earlier do not use > v11.x ?

baudneo · November 1, 2024, 4:58pm

I personally use a modified version of @edurenye repo.

IIRC, the comment about cuda had something to do with building piper and onnxruntime gpu. You can just try it out and see if you get any errors. If you get weird errors, switch container tags to a newer or older version of CUDA/cudnn.

There shouldn’t be issues though, it’s fairly straightforward in these containers as they’re built for this purpose and to be user friendly.

TheStigh · November 3, 2024, 7:43pm

@baudneo Did you ever get to test using a local model for Whisper add-on? I can’t find any information about this anywhere.

baudneo · November 4, 2024, 12:14am

I did but there were errors. I didn’t spend much time on it due to my needs being met already. I’ll take a look and point you in right direction, I based my work off of a closed pull request in one of the repos. Might of been faster-whisper but the rhasspy version?

TheStigh · November 4, 2024, 12:53pm

I successfully converted the model I referred to on Huggingface, and loaded it into the Whisper add-on for testing within HA and works like a charm! I expect to receive my GPU today or tomorrow and will then load the Large model to my Ubuntu.

Still, I am a bit unsure what docker image to use for my local faster-whisper…

alienatedsec · November 4, 2024, 5:24pm

@TheStigh Pick whichever will work - see below

Of course HA ‘couldn’t understand that’
chrome_UewxU2e3Az

TheStigh · November 7, 2024, 7:20pm

@alienatedsec @baudneo It is running reaøøy smooth! Thanks to you both! I use a Nvidia RTX A4000, and even have a spare I can add if needed for future projects

Question: Any of you tried to train voices (Piper) using the GPU setup ?

baudneo · November 8, 2024, 4:21pm

No training, as I haven’t had a use case for it. So can’t help much there at all.

What steps did you take to convert a HF model for use with faster whisper?

Congrats on getting things to work with your native language!

TheStigh · November 8, 2024, 5:27pm

Did run a conversion script

baudneo · November 8, 2024, 8:57pm

You wrote it yourself? Or did you use a guide? Any information on it?

TheStigh · November 8, 2024, 9:23pm

Here it is:

You will need Python3.
Install pip install transformers>=4.35.2

Then run:

ct2-transformers-converter --model XYZ --output_dir <YOUR DIR> --copy_files tokenizer.json preprocessor_config.json --quantization float16

If the model you want to convert is at Huggingface, simpy give the path, like this:

--model openai/whisper-large-v3