Rough step on getting Whisper/Wyoming working on a Framework Desktop (AMD Strix)

I’ve been wanting to get a faster Whisper solution but haven’t had a lot of options with my old setup (which runs Piper and Whisper off my NAS). I picked up a Framework Desktop so I could experiment around with local LLM stuff and wanted to try and get Whisper running on it.

I was surprised at how fast it is. It’s like twice as fast as Alexa (when we were using it, which we haven’t been for several yearsa now). I haven’t (yet) looked at hooking up an LLM for more natural language requests to do things around the house (“Hey Nabu, I’m hot” as an example) but I want to play with this too.

Oh yeah, this is using the Home Assistant Voice puck thing but I assume it would work for other Voice implementations. I assume if folks are here, they might already know how to setup Wyoming stuff in HA so I didn’t document that here (sorry).

Full disclosure, I had to get a lot of help from Claude so I won’t claim this is elegant and I had to enlist some help from “the enemy” as it were. Specifically I wanted to avoid Docker as I may use my Desktop for k3s (k8s) stuff and didn’t want things to get super messy. Notably there are a few Docker Compose solutions available.

ROCm needs to be installed. I didn’t provide this info here as the process for doing so is pretty straightforward (with steps available on frame.work’s forums). Using a Framework Desktop just for Whisper doesn’t generally make financial sense so I’m assuming folks will already be doing LLM or other ML tasks. So with that assumption, here’s hopefully enough to get other people going that are wanting to get this rolling:

build.sh:

#!/bin/bash

# Build Whisper.cpp itself
git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
mkdir build && cd build

cmake .. \
  -DGPU_TARGETS="gfx1151" \
  -DGGML_HIP=ON \
  -DCMAKE_C_COMPILER=/opt/rocm/bin/amdclang \
  -DCMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++ \
  -DCMAKE_PREFIX_PATH="/opt/rocm" \
  -DGGML_ROCM=1

cmake --build . --config Release -j$(nproc)

# Build Wyoming, so HA can talk to it
cd ~
git clone https://github.com/debackerl/wyoming-whisper.cpp.git
cd wyoming-whisper.cpp
script/setup

# This part is ugly. I needed to build the latest pywhispercpp
# else it wasn't seeing the GPU
cd ~/wyoming-whisper.cpp
source .venv/bin/activate
pip uninstall pywhispercpp -y
git clone --recursive https://github.com/abdeladim-s/pywhispercpp.git
cd pywhispercpp
git submodule update --init --recursive
CMAKE_ARGS="-DGGML_HIP=ON -DGPU_TARGETS=gfx1151 -DCMAKE_C_COMPILER=/opt/rocm/bin/amdclang -DCMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++ -DCMAKE_PREFIX_PATH=/opt/rocm" \
pip install -e . --no-build-isolation

# This might not be needed here but a model does need to get installed
cd ~/whisper.cpp
bash models/download-ggml-model.sh large-v3-q5_0

run.sh:

#!/bin/bash

IP="0.0.0.0" # Probably want to put your hosts' private IP here

export PATH=~/whisper.cpp/build/bin:$PATH
export ROCR_VISIBLE_DEVICES=0
export HSA_OVERRIDE_GFX_VERSION=11.5.1
export LD_LIBRARY_PATH=/opt/rocm/lib/llvm/lib:$LD_LIBRARY_PATH

cd wyoming-whisper.cpp
script/run \
  --model large-v3-q5_0 \
  --language en \
  --uri "tcp://${IP}:10300" \
  --data-dir ./data \
  --download-dir ./data

Service:

[Unit]
Description=Wyoming Whisper (whisper.cpp + ROCm)
After=network.target

[Service]
Type=simple
User=whisper
WorkingDirectory=/opt/whisper
Environment="PATH=/opt/whisper/whisper.cpp/build/bin:/opt/whisper/wyoming-whisper.cpp/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
Environment="LD_LIBRARY_PATH=/opt/rocm/lib/llvm/lib"
Environment="ROCR_VISIBLE_DEVICES=0"
ExecStart=/opt/whisper/run.sh
Restart=unless-stopped
RestartSec=5

[Install]
WantedBy=multi-user.target

There is no point in using whispers at all, since there is parakeet.

This model (via the sherpa library) is currently used as the default in the Wyoming Whisper addon. Yes, it sounds strange :upside_down_face:.
You can also use the onnx-asr library.
Both options only require installing onnxrruntime for AMD.

1 Like

The above was about my 7th attempt if you count trying to get this to work with various GPUs I had lying around before I got the Framework. It took me 3-4 tries of various approaches and github repos to get this working yesterday. It’s one of those “I got it working, so I’m going to slowly back away now” things :slight_smile: It’s easy to get it working with the CPU, but GPU, AMD specifically, took some efforts.

I did poke around Parakeet to see if I could get info on how it can work on AMD Strix but I didn’t find much about that. Do you happen to have some sources on hand? Kind of one of the problems with trying to set this up, there are sooooo many forks, repos, projects to try and sift through to find one that works along with some AI agentic code sprawl in places.

One setup though it’s been really awesome and snappy. I pointed HA to my ollama instance to run a small LLM model for more free flow commands and that’s also working rather well (and fast!)

I understand your feelings; I felt the same way a year ago. But if you take some time to solidify the knowledge, it all becomes less scary. Moreover, working with ready-made modern libraries (and ready-made projects) is quite simple.

The logic is roughly as follows:
server code → engine library → runtime → driver

the first three components usually make up a single product, which we deploy in venv or Docker.

I played around with trying to get that working. Couple of missing steps which I was able to slog through but wasn’t able to get GPU accel working (based on the linked post, GPU accel might not even be needed though). Had to stop for a while. Worth picking up at a later date though I have other stuff I wanna work on now that I have a setup that works well. Might be a project to tackle maybe this summer.