I am running whisper and piper to have Assist listen and respond to voice commands. All of this runs on an Raspberry Pi 4. However, my mothertounge is german. And whisper is not really working for that. I do not speak any dialect, just plain old german.
Also: Whisper is obviously set to german as well.
Examples:
“Flur” (hallway) is always recognized as “Flua” or “Flue”. Because the “r” is usually silent, when spoken.
“Schalte ein” (switch on) is nearly always recognized as “Schaltet ein”. The extra “t” does not make any sense and leads to “Das konnte ich leider nicht verstehen” (I did not understand that).
These issues with important commands are making whisper unusable for me.
Note: I also made a pipeline using Nabu Casa Cloud and this works perfectly. So my voice inputs should be fine. Just the local processing does not understand me.
Are any of you having similar problems? Am I missing some important configurations?
Tanks!
But my experience is before the update to 2023.8.x (day before yesterday), so can’t say if the Whisper update brought something to make it better.
For now I’m staying with NC, but I know, I get pissed, when the Telekom does go black for hours again (idiotic construction workers next to the Telekom “Kasten” ) and I can’t reach my voice assistant…
EDIT:
An example from my “wrong understandings”:
“Betti gehen” is my command for the dog as well as the voice assistant that we are going to bed now. For the life of me I can’t figure out how to speak “Betti” so it doesn’t get understood as “Betty”. For now I’m working with custom sentences where I just set “Bett(i|y) gehen”, but in the end some polishing for Whisper and German would be really nice.
Maybe we should ask Mike “the voice” if we can help in any way? Would you be interested and do you have some free time, if help would be necessary? No harm in saying no!
Using Whisper with tiny-int8 and even small-int8 for German is more or less useless. I switched to medium-int8 and it works so much better. The downside is, a request takes around 60 seconds running on a raspberry pi 4. Hope these models will improve over time or some sort of caching will be implemented.
Same here - I’m actually on okish hardware (Core i7 7th gen with 16gig of ram) but Speech to text is either slow or doesn’t understand a word. Not usable at all in my expierience.
Can someone confirm, that vosk works better for german?
I tested faster-whisper today and tiny is really bad, small work a bit better but not good enough to reach the WAF.
Medium works in my tests, but it needs 30 seconds on my pc (i5-1240P) - I have no nvidia graphics card available to speed it up.
Does someone test if whisper.cpp performes better (it can use the intel graphics card via openVINO) in this case?
I tried it today - “Esstischlampe” is also a problem, but with the sentences correction it’s easy to adjust it. It’s really fast and works perfekt in my first tests! I think it’s above the WAF now.