Run whisper on external server

Thanks so much again for the help, after a bit of a ballache (mostly because I’m so inexperienced, and had to download and install curl plus other packages to get it working), I got it all working!!

Now to play and try it out on my satellites. Thanks again :slight_smile:

1 Like

Hi again, I just wanted to check something with you, because I am actually still having some small issues (maybe).

When I wrote my message yesterday, initially the whisper lxc I set up with your guide was working, but shortly after it stopped, and I couldn’t get it running again. So I created a new one, and have carefuly looked at the messages I’m receiving in the console as I’m configuring the lxc. I’ve noticed that when I install the following (I’ve been playing with models, and base int8 seems to give a very fast response, with decent accuracy):

curl -L -s https://github.com/rhasspy/models/releases/download/v1.0/asr_faster-whisper-base-int8.tar.gz | tar -zxvf - -C /data

I get the following error in the console:

tar: /data: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now

Is this a problem? When Whisper is running, it seems to work fine, in fact I’m getting great response times and accuracy, as can be seen below :slight_smile:

And the last thing, if I reboot the lxc, or proxmox host, is it necessary for me to add the script/run command again? Doesn’t seem like Whisper runs unless I go back into /wyoming-faster-whisper and rerun the script/run server (this is probably my very basic knowledge of docker that’s ‘causing the issue’ :rofl:)

That error is saying you do not have a directory called ‘/data’. Extract the file somewhere else, might as well make it the same directory you are working from. It’s important not to just copy+paste commands but instead use them as a guide and modify for your own use case.

For the other issue, if you followed the instructions to create the systemd .service file, then you can set whisper to autostart with sudo systemctl enable --now name-of-service.service - that command will set it to autostart and also start the service in the background.

1 Like

Thanks again for the help. I decided to (after finding out how) make a folder called /data, and now I have no problem downloading the models.

But I’m still a little bit unsure about the full whisper.service file. I have created it, but the only parts I’ve modified is my username, and the model type (I’m playing with base int8 at the moment). Does that correct to you?

Sorry if it’s a rookie question, it’s all still very new to me :slightly_smiling_face:

Once i’m sure I have the service file configured correctly, I will rerun the last two commands you mentioned (which I have already done, but so far, whisper doesn’t start on container reboot).

Can you run the command systemctl status whisper.service (or whatever the name of the service file is for you) and post the results? Enabling the service with the command I posted before should work.

You can also run journalctl -u whisper.service -f to get a sense of what is happening when the system tries to start whisper.

Thanks for your response. When I run the systemctl status whisper.service command, this is what I get:

root@Whisper-base-int8:~/wyoming-faster-whisper# cd --
root@Whisper-base-int8:~# systemctl status whisper.service
x whisper.service - Faster Whisper
     Loaded: loaded (/etc/systemd/system/whisper.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2024-01-10 07:36:02 UTC; 1 day 11h ago
    Process: 286 ExecStart=/home/celodnb/wyoming-faster-whisper/script/run --model base-int8 --compute-type int8 --beam>
   Main PID: 286 (code=exited, status=203/EXEC)
        CPU: 532us

Jan 10 07:36:02 Whisper-base-int8 systemd[1]: whisper.service: Scheduled restart job, restart counter is at 5.
Jan 10 07:36:02 Whisper-base-int8 systemd[1]: Stopped Faster Whisper.
Jan 10 07:36:02 Whisper-base-int8 systemd[1]: whisper.service: Start request repeated too quickly.
Jan 10 07:36:02 Whisper-base-int8 systemd[1]: whisper.service: Failed with result 'exit-code'.
Jan 10 07:36:02 Whisper-base-int8 systemd[1]: Failed to start Faster Whisper.
lines 1-12/12 (END)

This is likely another rookie question, but at what level am I supposed to have created the sudo systemctl edit --force --full whisper.service file? Should it be at top level (root@Whisper-base-int8:~#), or lower down (root@Whisper-base-int8:~/wyoming-faster-whisper#)? Or maybe it doesn’t matter?

If I run the journalctl -u whisper.service -f command, then I get this:

root@Whisper-base-int8:~# journalctl -u whisper.service -f
Jan 11 18:42:05 Whisper-base-int8 systemd[286]: whisper.service: Failed to locate executable /home/celodnb/wyoming-faster-whisper/script/run: No such file or directory
Jan 11 18:42:05 Whisper-base-int8 systemd[1]: Started Faster Whisper.
Jan 11 18:42:05 Whisper-base-int8 systemd[286]: whisper.service: Failed at step EXEC spawning /home/celodnb/wyoming-faster-whisper/script/run: No such file or directory
Jan 11 18:42:05 Whisper-base-int8 systemd[1]: whisper.service: Main process exited, code=exited, status=203/EXEC
Jan 11 18:42:05 Whisper-base-int8 systemd[1]: whisper.service: Failed with result 'exit-code'.
Jan 11 18:42:07 Whisper-base-int8 systemd[1]: whisper.service: Scheduled restart job, restart counter is at 5.
Jan 11 18:42:07 Whisper-base-int8 systemd[1]: Stopped Faster Whisper.
Jan 11 18:42:07 Whisper-base-int8 systemd[1]: whisper.service: Start request repeated too quickly.
Jan 11 18:42:07 Whisper-base-int8 systemd[1]: whisper.service: Failed with result 'exit-code'.
Jan 11 18:42:07 Whisper-base-int8 systemd[1]: Failed to start Faster Whisper.

To my basic knowledge, it sounds like it can’t find anything at /home/celodnb/wyoming-faster-whisper/script/run, which is one of the lines from the service file. My service file looks like this:

[Unit]
Description=Faster Whisper
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
ExecStart=/home/celodnb/wyoming-faster-whisper/script/run --model base-int8 --compute-type int8 --beam-size 2 --languag>
WorkingDirectory=/home/celodnb/wyoming-faster-whisper
Restart=always
RestartSec=1

[Install]
WantedBy=default.target

Well, is that where the run script is located?
move into the faster-whisper dir

cd /home/celodnb/wyoming-faster-whisper

Then do ls -l and see what is there. You may have missed a step.

Well I’m thinking, or assuming that I’ve done something very simple wrong, or I just have a lack of understanding of how docker/lxc containers work or are structured.

If I try and run cd /home/celodnb/wyoming-faster-whisper, then it says that no directory exists. At root level, if I run the ls -l command, then I have a few files. I’m not 100% sure how to read it, but I assume it’s saying that I don’t have any directories, or folders? Anyway, here’s what I see.

root@Whisper-small-int8:~/wyoming-faster-whisper# cd /home/celodnb/wyoming-faster-whisper
-bash: cd: /home/celodnb/wyoming-faster-whisper: No such file or directory
root@Whisper-small-int8:~/wyoming-faster-whisper# cd --
root@Whisper-small-int8:~# cd /home/celodnb/wyoming-faster-whisper
-bash: cd: /home/celodnb/wyoming-faster-whisper: No such file or directory
root@Whisper-small-int8:~# ls -l
total 333752
-rwxr-xr-x 1 root root 341737575 Dec  4 07:53 NVIDIA-Linux-x86_64-535.146.02.run
-rw-r--r-- 1 root root      4332 Apr 20  2023 cuda-keyring_1.1-1_all.deb
drwxr-xr-x 3 root root      4096 Jan 10 07:25 models
drwxr-xr-x 6 root root      4096 Jan 10 07:21 wyoming-faster-whisper

Yeah you have it, but it is in root’s home directory. It doesn’t need to run under the root user, but that is a whole other discussion about security and least privilege which is best learned elsewhere. I just hope you are not exposing this LXC or your Proxmox host to the internet.

Copy the entire ‘wyoming-faster-whisper’ folder to the celodnb home dir, then look up how to change ownership of files and folders in Linux. Reboot and try again.

Thanks again for your patience, I’ve created a brand new container from scratch. I think the part I was struggling with, was understanding what level I was at (and making sure I was logged in as my user on the container), and I also had to modify the service file a bit more (I now understand the paths, and how to modify them for my system).

And now it works! Rebooting the container, and it it continues to work!

Thanks again for your help :slight_smile:

1 Like

Just fired this up on an old PC I scored…

  • Dell mini tower with i5-3550, 16GB RAM
  • Shoehorned in a spare GTX-1070 and 650W PSU I had laying around

Without a doubt GPU is the way to go!

I need to do some more testing, but some initial figures using the small-int8 model, values in seconds as shown in HA debug;

i7-3610QM i5-3550 GTX1070
What is the Weather 3.74 3.24 0.2
How much is a loaf of bread 3.85 3.33 0.24

HA is running on the i7 machine (a laptop), the i5 and 1070 are in the same (separate) box.
Switching the GPU to the small model increases processing times to ~0.27 seconds… switching to medium-int8 increases processing times to ~0.48 seconds. Still massively faster than CPU!

On all models accuracy was pretty good, using HA companion app on an Android tablet as the microphone.

More details on the setup here (still a WIP); https://github.com/Fraddles/Home-Automation/tree/main/Voice-Assistant

Cheers.

2 Likes

could you specify the type of tour you took? It’s interesting to know the complete hardware list, especially when it comes to fitting a graphics card of this type.

Luckily this particular Dell (Vostro 470) takes mostly standard components, and from new there was a dedicated GPU option (GTX 1660 Super…mine did not have this), so it has room in the case, although very limited cooling capacity…

The Coolermaster 650W PSU slotted straight in, even working with the quick release in the case. All power cables connected without adapters, although I seem to have an overabundance of power cables in this little case… As a bonus the replacement PSU has a much larger fan than the original.

A few minutes with the cordless drill to remove a handful of rivets (eight to be precise), and the lower drive cage came out to make room for the 1070.

That is about it other than ripping out other unnecessary bits (optical drive, card reader, front panel audio…)

It’s not quite finished yet… Right now the SSD is just sitting in the bottom of the case… :stuck_out_tongue:
Before


After

Cheers.

Any idea of how much power you will be drawing, both instant during processing and over time with idle and processing?

I’ve been trying to size the power change (and noise issues for the significant other) of ‘bringing voice and AI in house’. Most, and I included initially, do not realize that a little Amazon Echo or Google puck is just the tip of the iceberg to having a Jarvis in our homes…

I think there is a market coming for selling a ‘home friendly’ AI capable, robust and relatively low power box of type. Right now, I will just cook hot dogs on the box and wear ear plugs in the lab :wink: .

No idea on the final power figures yet, but the Nvidia driver tells me the 1070 is drawing ~7W at ‘idle’ (Whisper and Xfce GUI running but not doing anything).

At this point it has just been a PoC, as all the hardware was ‘free’… Have now spent ~$100 on a few bits and bobs to flesh it out… Waiting on shipping…

Noise level seems pretty good… cannot hear it over the NAS it is sitting next to… :stuck_out_tongue:

I am hoping that this thing can replace both the laptop currently running HA, and the (old, power hungry) NAS, so the power figures might come out ahead in my case :slight_smile:

My aim is for a (relatively) compact, quiet, box with enough storage and compute to ‘do it all’ (I might need to find some Unicorn farts for cooling…)

Cheers.

1 Like

Any chance you could share your docker-compose for this setup please? I’m considering getting a GPU for my NAS so I can try this out as that seems pretty speedy. Currently have it running on my NAS with CPU but found it too slow and wake words don’t work all the time.

I did a little more testing today with CPU vs GPU…

It was only AFTER my testing that I remembered the i7 is using the small-int8 model while the i5/GPU is using small. Spoken sentence ‘What is the weather?’, average STT processing in seconds over four runs.

i7-3610QM i5-3550 GTX1070
3.7 3.6 0.26

Then I started throwing some random sentences at it…

Sentence Time
Can you tell me what time it is? 0.29s
Is there under street parking at the city? 0.3s
Does Harry have to go to school tomorrow? 0.3s
Is there anything that you do understand? 0.34s
Are there any sentences that will make you blow up the world? 0.35s
Have you ever seen the movie Superman? 0.29s
What time is it in Bangalore? 0.3s
Is there a space station overhead right now? 0.31s
How many Terminator movies are there and which one was the best? 0.34s
Ah, stop being silly. 0.28s

All sentences were ‘heard’ by HA correctly… the above is copy/pasta from the debug screen, including punctuation.

The biggest problem I had was maybe it being too responsive… the slightest pause in speech and it would take off and run STT on half your sentence…

GPU is using 1322 MiB of VRAM running the small model.

No clue on power consumption as yet, but things barely get warm…

Cheers.

3 Likes

Hi,

I am currently trying to offload everything to my CUDA Host (Tesla P4). Your repo helped a lot with that. Thanks for that :slight_smile:

I just have an issue with the self compiled version of Piper. For some reason it shows an error for every request. Is this also happening for you?

INFO:__main__:Ready
DEBUG:wyoming_piper.handler:Synthesize(text='Dies ist ein Test!', voice=SynthesizeVoice(name='de_DE-thorsten-high', language=None, speaker=None))
DEBUG:wyoming_piper.handler:synthesize: raw_text=Dies ist ein Test!, text='Dies ist ein Test!'
DEBUG:wyoming_piper.process:Starting process for: de_DE-thorsten-high (1/1)
DEBUG:wyoming_piper.download:Checking /data/de_DE-thorsten-high.onnx
DEBUG:wyoming_piper.download:Checking /data/de_DE-thorsten-high.onnx.json
DEBUG:wyoming_piper.process:Starting piper process: /usr/share/piper/piper args=['--model', '/data/de_DE-thorsten-high.onnx', '--config', '/data/de_DE-thorsten-high.onnx.json', '--output_dir', '/tmp/tmp4uhkjw_k', '--json-input', '--use-cuda', '--debug']
DEBUG:wyoming_piper.handler:input: {'text': 'Dies ist ein Test!'}
DEBUG:wyoming_piper.handler:
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-6' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.10/dist-packages/wyoming/server.py:28> exception=FileNotFoundError(2, 'No such file or directory')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/wyoming/server.py", line 35, in run
    if not (await self.handle_event(event)):
  File "/usr/local/lib/python3.10/dist-packages/wyoming_piper/handler.py", line 98, in handle_event
    wav_file: wave.Wave_read = wave.open(output_path, "rb")
  File "/usr/lib/python3.10/wave.py", line 509, in open
    return Wave_read(f)
  File "/usr/lib/python3.10/wave.py", line 159, in __init__
    f = builtins.open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: ''

Additional Infos:

root@c5eefeadb680:/# echo 'Welcome to the world of speech synthesis!' |   /usr/share/piper/piper --model /data/de_DE-thorsten-high.onnx --config /data/de_DE-thorsten-high.onnx.json --output_file welcome.wav --use-cuda
terminate called after throwing an instance of 'Ort::Exception'
  what():  /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1055 void onnxruntime::ProviderSharedLibrary::Ensure() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_shared.so with error: libonnxruntime_providers_shared.so: cannot open shared object file: No such file or directory

Aborted (core dumped)

Hi, sorry I should of updated.

There are a few issues for trying to build piper for CUDA accel. There is also an issue of where CUDA accel piper takes a long time to process short phrases.

To begin, onnxruntime-gpu is only built using CUDA 11, so no CUDA 12 or compute capability for 40 series cards. To remedy this, you can build onnxruntime-gpu using CUDA 12 but, it takes quite awhile and is error prone.

The 2md issue building piper using CUDA 11 in a docker image is the shared libs have some.issues. I have piper built using CUDA but I get a seg fault when I run piper and use the --use-cuda flag. I have been trying to get a docker image built with CUDA accel piper but have not succeeded yet.

The issue with CUDA accelerated piper and short phrases seems to be a deal breaker. In my personal testing (non-docker piper build), there is a negligible performance improvement using CUDA piper. As in, non CUDA = 0.4s and CUDA = 0.3s.

Realistically, cpu piper seems fast enough for the time being. I will still try and get a CUDA accel piper build working but, it is not a priority.

The main thing to have CUDA accel is faster-whisper, which is accomplished. I haven’t used wake word yet so I can’t speak on if cuda makes a performance difference.

1 Like

Hey, thanks for the input! For me the CPU Version was not fast enough for my taste, thats why I was looking to accelerate it with CUDA.

If you want to test things with my setup feel free to ask :slight_smile:

Proxmox VE on a Tesla P4 (CUDA 12.2) host with a virtualized docker host.