Matching intent on sentence that contains words

jjdenhertog · March 19, 2024, 7:45am

Hi

I’m switching from a Rhasspy voice setup to the new Home Assistant voice setup and find it challenging to get consistent matches via Whisper. One of the challenges I’m facing is that sometimes extra words are detected and that messes up the intent matching.

Currently I’m hacking this using wildcards to ignore extra words. But the most ideal solution would be if the intents could be matched on the existence of specific words. So if a sentence contains “lights” and “on” it would use that intent. No matter what the other words in the sentence are.

Is anyone facing similar issues? And is something like this (matching on some words only) possible?

Cheers!

mchk · March 23, 2024, 1:32pm

Try using VOSK. you can set a fixed dictionary of acceptable phrases in it

jjdenhertog · March 24, 2024, 2:33pm

Awesome, this was just what I was looking for!

jjdenhertog · March 26, 2024, 2:30pm

@mchk Do you by any chance also know how I can support a numbered range in VOSK? I can’t find it anywhere. The configuration below doesn’t seem to work with VOSK:

  volume:
    range:
      from: 0
      to: 50

mchk · March 26, 2024, 5:03pm

It is necessary to use a list. See the attached example.

github.com

rhasspy/wyoming-vosk/blob/c78f71bef3e4811b3f9c5ece765cfd33ae4721c2/examples/en.yaml

sentences:
  - turn (on|off) [the] ({device}|{light})
  - turn (on|off) [the] ({area} lights | lights in {area})
  - turn [the] ({device}|{light}) (on|off)
  - (set|change) [the] ({light}|{area} lights) [to] {color}
  - (set|change) [the] ({light}|{area} lights) (brightness [to ]<brightness>|[to ]<brightness> [brightness])
  - in: red alert
    out: set living room light red
  - in: all clear
    out: set living room light white
expansion_rules:
  brightness: ({brightness_names}|{brightness_numbers}[ percent])
lists:
  color:
    values:
      - red
      - green
      - blue
      - orange
      - purple

This file has been truncated. show original

jjdenhertog · March 27, 2024, 6:35am

@mchk I got it all up and running, and it works very well! The only challenge that is left are false triggers. It tries to mach sentences that do not exist.

In the debug logs I notice a score, example below. Do you know if it’s possible to filter by score? Or to pass the score data to the intent? I notice that when the score is above 6-8 it’s actually the wrong correction.

DEBUG:root:Transcript for client 736074379299864: bed is disco
DEBUG:root:score=12/0.0, original=bed is disco, final=switch
DEBUG:root:Corrected transcript: switch

mchk · March 27, 2024, 3:40pm

I do not know that. It is necessary to explore the possibilities of of the project.
It may be advisable to open a request in issues
But I added several stub phrases (turn on, turn off, set…) to the list. Now if I interrupted the voice command, the vosk is more likely to select one of them and further actions are interrupted.
Commands that match the pattern are executed fairly accurately.

jjdenhertog · March 27, 2024, 7:32pm

Thanks for pointing in the right direction. Looking into the source code I found that the “correct sentences” value you can input into the configuration is used with the debug score.

If the value is 0 it will match anything it gets back
If the value is higher it will check for equal or less than the score

So in the case where you would see a false correction with a debug score of 5/0.0. You could set the correct sentences to 4. Which prevents any scores higher than 4 to be rejected as correct answer.

mchk · March 28, 2024, 10:08am

I did not turn on the debug mode and could not understand the dimension of the values of this parameter. Now it is clear what the parameter is related to.
It remains to figure out the syntax of “No Correct Patterns”.
As far as I understand, this option allows you to create slots in templates that can be filled with any word. But I haven’t figured out how to use it yet.

IBRI12 · May 3, 2024, 12:31pm

I figured it out. Example:

sentences:
  - Licht an
  - hallo
  - suche

no_correct_patterns:
  - (suche [A-Z,a-z]+)

This is just standard regex. Tested here https://regexr.com/. Now I can say “suche {Word}”

mchk · May 6, 2024, 6:36pm

Everything is simpler. if the specified phrase is present in the spoken sentence (usually at the beginning), then the subsequent part will be recognized without correction.

My example for interacting with wikipedia

no_correct_patterns:
  - what is [a|the]

As a result, I can ask “what is the sun”. Using custom sentences, pick up the last word as a variable and make a request through the Wikipedia API