Whisper is REALLY bad at understanding german. What can I do about that?

I am running whisper and piper to have Assist listen and respond to voice commands. All of this runs on an Raspberry Pi 4. However, my mothertounge is german. And whisper is not really working for that. I do not speak any dialect, just plain old german.
Also: Whisper is obviously set to german as well.

Examples:
“Flur” (hallway) is always recognized as “Flua” or “Flue”. Because the “r” is usually silent, when spoken.

“Schalte ein” (switch on) is nearly always recognized as “Schaltet ein”. The extra “t” does not make any sense and leads to “Das konnte ich leider nicht verstehen” (I did not understand that).

These issues with important commands are making whisper unusable for me.

Note: I also made a pipeline using Nabu Casa Cloud and this works perfectly. So my voice inputs should be fine. Just the local processing does not understand me.

Are any of you having similar problems? Am I missing some important configurations?
Tanks!

3 Likes

+1 here :wink: Exactly the same!

But my experience is before the update to 2023.8.x (day before yesterday), so can’t say if the Whisper update brought something to make it better. :slight_smile:

For now I’m staying with NC, but I know, I get pissed, when the Telekom does go black for hours again (idiotic construction workers next to the Telekom “Kasten” :rofl:) and I can’t reach my voice assistant… :wink:

EDIT:
An example from my “wrong understandings”:
“Betti gehen” is my command for the dog as well as the voice assistant that we are going to bed now. For the life of me I can’t figure out how to speak “Betti” so it doesn’t get understood as “Betty”. For now I’m working with custom sentences where I just set “Bett(i|y) gehen”, but in the end some polishing for Whisper and German would be really nice.

Maybe we should ask Mike “the voice” if we can help in any way? Would you be interested and do you have some free time, if help would be necessary? No harm in saying no! :wink: :slight_smile:

3 Likes

Hallo Leidensgenossen, same for me :slight_smile:

Using Whisper with tiny-int8 and even small-int8 for German is more or less useless. I switched to medium-int8 and it works so much better. The downside is, a request takes around 60 seconds running on a raspberry pi 4. Hope these models will improve over time or some sort of caching will be implemented.

1 Like

Same here - I’m actually on okish hardware (Core i7 7th gen with 16gig of ram) but Speech to text is either slow or doesn’t understand a word. Not usable at all in my expierience.

What about Nabu Casa? Is it better?

Same problem here with 64GB RAM and a powerful i9 processor. It makes Assist pretty unusable right now.

Hello

Try vosk addon ( from synesthesiam :wink: )
hassio-addons/vosk at master · rhasspy/hassio-addons (github.com)
Very speed and accurate ( less 1s on RPI4) with French, Spanish and many other language

Can someone confirm, that vosk works better for german?
I tested faster-whisper today and tiny is really bad, small work a bit better but not good enough to reach the WAF. :wink:
Medium works in my tests, but it needs 30 seconds on my pc (i5-1240P) - I have no nvidia graphics card available to speed it up.

Does someone test if whisper.cpp performes better (it can use the intel graphics card via openVINO) in this case?

Yes, I can confirm that vosk is way better than whisper for german. I get almost instant responses and the recognition is also much better

1 Like

I tried it today - “Esstischlampe” is also a problem, but with the sentences correction it’s easy to adjust it. It’s really fast and works perfekt in my first tests! I think it’s above the WAF now. :smiley:

2 Likes

Which parameters do you use to get to acceptable results?

Hi, i tried this Whisper Model:
devirex/whisper-faster-small-german

It works good and fast (3-4 Seconds with Intel N100) in general – but it is quite creative with the text it generates:
“Bayern-1-Webradio starten”
“Bayern 1 Web Radio starten”
“Bayen eins WebRadio starten”
“Bayern neins Webradio starten”

My Problem is that the Home Assistant “Conversation Agent” very seldom understands what is to do.

Has anybody an solution to this stupidity of the “Conversation Agent” ?

Is Vosk / Rhasspy-speech / LocalAI / HomeLLM creating better results with “german” ?

I used the language model “vosk-model-de-0.21” for vosk. And I added a de.yaml to the sentences folder with this content:

sentences:
  - Schalt {devices} {switchValues}
lists:
  switchValues:
    values:
      - ein
      - an
      - aus
  devices:
    values:
      - "die Esstischlampe"
      - in: "die Spielekonsole"
        out: "die Nintendo Wii"

For my tests it was working goog and fast - a problem is that I set CORRECT_SENTENCES = 0 as environment variable. Vosk tries now to match every input to this yaml file. That results in some stupid results. For example “Koch mir einen Kaffee” is detected as “Schalte die Esstischlampe ein”.

After that, I stopped my tests for the first - because the ESP32 on device wake word detection is also not really good and has many wrong detections. :-S
If I have some time, I will try to improve my setup - may be HA Voice is a good alternative as small voice input device?