Improve whisper performance on intel hardware

I’ve been dipping my toes in the voice control waters to check if its possible to replace Alexa with assist for home control. I have an onju home board that I added to a google nest mini and I have it working.

However, the speech to text recognition is pretty bad in Spanish. Even speaking very clearly in an ambient with no noise, accuracy ranges from absolutely useless from the tiny and base models, to a hit or miss using the small_int8 model. And I promise you I’m actually a clear speaker without any strong accent.

It’s clear that non-english speakers need to use bigger models than small, but even on my fairly decent home server (12gen i3 with 10 cores and 32gb of ram, running proxmox with HA and a few other apps) the small_int8 is really the biggest model one can use, as the medium takes 6-7 seconds to respond to a command.

Has anyone succeeded in running whisper, whisper.cpp, faster_whisper or any whisper mod leveraging the integrated GPU in modern intel hardware???

The integrated Xe GPU in the 12th/13th gen intel processors and above uses the same ARC architecture than the dedicated intel ARC GPUs, and I’ve seen that intel published some libraries to accelerate inference using their ARC cards, so I’m inclined to think that it should be possible, but I don’t know enough about pytorch and AI to even begin to investigate.
These Iris Xe iGPUs are moderately capable too, on par with the Radeon Vega 8 in AMD 4000 APUs or order mid-tier discrete graphic cards like the GTX 860M.

Running home assistant on intel NUCs or other repurposed hardware with intel CPUs that have integrated graphics is fairly common, so if this was possible, a lot if people would benefit from it.

Even more so once we attempt to also generate responses using a small LLM.

UPDATE:
Just to be sure I started saving my voice recording using these lines in the configuration.yaml:

assist_pipeline:
  # Store audio recordings for debugging/training purposes
  debug_recording_dir: /config/www/assist_pipeline/

I wanted to be sure the results weren’t bad because of audio quality issues but that’s not the reason, the audio samples I get are pretty decent, with clear voice and insignificant background noise.

1 Like

Hello

Try Vosk Addon , very speed and accurate for supported non english language

I did, and it’s indeed very fast. But I found accuracy to be…weird. Specifically for smart home related sentences is very good, but nowhere nearly as good for other sentences like “How much time is left for the washing machine?”.

Also, I found that sometimes it’s too eager in responding to commands. So much so it doesn’t wait until I’ve finished talking, so when listening to a sentence like “Turn on the lights on the kitchen”, as soon as it hears “Turn on the lights” it doesn’t continue listening the “in the kitchen” part.

But I agree that speed-wise is amazing. It’s so fast that sometimes the lights are turned on before I’ve closed my mouth from speaking. Like 1/10th of a second.

If you manage to use igpu acceleration in whisper.cpp then pay attention to this project GitHub - ser/wyoming-whisper-api-client: Wyoming protocol server for the Whisper API speech to text system
Perhaps there are some other recognition implementations with a suitable api that can be connected to the Wyoming protocol

And with more larger model ?

VOSK Models (alphacephei.com)

Actually, I have not. I’ll give it a go. The difference in size is tremendous (39M - > 1.4GB).

I tried the bigger model and accuracy better indeed. Nevertheless, I think that being able to run whisper leveraging GPU acceleration is a good thing. We don’t know how things are going to evolve and maybe whisper keeps improving while vosk stagnates.

It’s good to try to leverage the hardware we already have.

I know this does not answer your question but maybe it’s worth thinking about it.

Or make use of good old used hardware that can be bought cheaply.
I’ve thrown in a GTX 1050 TI using a “PCI-E 1X USB 3.0 riser card” since my server case is way to small (Amazon.de).

The response time is quite similar to echo devices using the german medium-int8 model.

I thought installing CUDA and stuff would be a hard task but it’s really not a big problem. I had the whole wyoming stuff already running on another machine (i run HA on a underpowered used thin client) using docker compose. Instructions how to do that should be easy to find…
The only thing i had to do was to install the gpu drivers and nvidia container toolkit to the docker host and change the whisper image to a CUDA supporting one.

If you consider trying this i recommend wyoming-addons/whisper/docker-compose.example.yml at 16d3cb41d0ed6be608118e7b1587194aabbf1967 · pierrewessman/wyoming-addons · GitHub
I adopted the meat of it into my existing setup so it works different from how it’s described in the repo’s readme.

I’m running it on Ubuntu 20.04 server and installed the nvidia-driver-535-server package. You would also need Installing the NVIDIA Container Toolkit — NVIDIA Container Toolkit 1.14.5 documentation and you should be ready to go.

Another benefit of it is that i’m now able to play around with other LLM’s using Ollama and Open-Webui on the same GPU which is also surprisingly easy to install with docker. My own private AI

Just in case you’re interested in that also: GitHub - open-webui/open-webui: User-friendly WebUI for LLMs (Formerly Ollama WebUI)

Sure, but I can’t put a GPU on an intel Nuc, and it’s a quite popular device, and relatively capable too, so I wondered if someone had managed to enable GPU acceleration on it.

I’d rather not have to buy another computer.

NUC… I overlooked that. Sorry

There are also adapters available for M2 slots :thinking:

Hi there!
Same situation here. I wonder if you got any far?
Thanks @cibernox

I did not. Seems that generally speaking AIs are getting better over time running on CPUs, and that’s something, but nothing I found seems to suggest that much effort is going into optimizing them for integrated GPUs