I’ve been dipping my toes in the voice control waters to check if its possible to replace Alexa with assist for home control. I have an onju home board that I added to a google nest mini and I have it working.
However, the speech to text recognition is pretty bad in Spanish. Even speaking very clearly in an ambient with no noise, accuracy ranges from absolutely useless from the tiny and base models, to a hit or miss using the small_int8 model. And I promise you I’m actually a clear speaker without any strong accent.
It’s clear that non-english speakers need to use bigger models than small, but even on my fairly decent home server (12gen i3 with 10 cores and 32gb of ram, running proxmox with HA and a few other apps) the small_int8
is really the biggest model one can use, as the medium takes 6-7 seconds to respond to a command.
Has anyone succeeded in running whisper, whisper.cpp, faster_whisper or any whisper mod leveraging the integrated GPU in modern intel hardware???
The integrated Xe GPU in the 12th/13th gen intel processors and above uses the same ARC architecture than the dedicated intel ARC GPUs, and I’ve seen that intel published some libraries to accelerate inference using their ARC cards, so I’m inclined to think that it should be possible, but I don’t know enough about pytorch and AI to even begin to investigate.
These Iris Xe iGPUs are moderately capable too, on par with the Radeon Vega 8 in AMD 4000 APUs or order mid-tier discrete graphic cards like the GTX 860M.
Running home assistant on intel NUCs or other repurposed hardware with intel CPUs that have integrated graphics is fairly common, so if this was possible, a lot if people would benefit from it.
Even more so once we attempt to also generate responses using a small LLM.
UPDATE:
Just to be sure I started saving my voice recording using these lines in the configuration.yaml:
assist_pipeline:
# Store audio recordings for debugging/training purposes
debug_recording_dir: /config/www/assist_pipeline/
I wanted to be sure the results weren’t bad because of audio quality issues but that’s not the reason, the audio samples I get are pretty decent, with clear voice and insignificant background noise.