This is perfect, thank you! I’ve just pushed a new version of Rhasspy with the French Kaldi profile. Big thanks to @fastjacksprt and pguyot.
What do you think of this for slots:
https://kaldi-asr.org/doc/grammar.html
Looks a lot like what Snips is doing for their language model entities.
I’ve tested the new Kaldi french model and for now it is performing really good.
I’ll add more intents to see how it behave with a more complex LM and test it from a distance with a far-field mic.
Though, the recognition speed is slow (compared to Pocketsphinx), I’m sure this can be improved in some way like using a single instance of Kaldi via py-kaldi-asr instead of executing a CLI that needs to load the AM+LM each time (using a remote HTTP server?).
Pushing chunks of WAV audio to the ASR (like Snips does/did) will also speed up the process and avoid POSTing a complete WAV file to the remote server.
For example:
If using the Kaldi GrammarFst approach mentionned above works it’ll allow for an even lighter language model for Kaldi, reducing the memory footprint for large slots (artists, songs, cities, etc.)…
Allowing the use of rule based slots should also help for special slots like date/time, temperature, number, ordinal, etc.
Rhasspy is begining to look really sexy as a Snips alternative
Glad to hear the model is working! I’ve already got a start on making the Kaldi recognition faster, actually based on the Zamia library you linked This is currently implemented in Rhasspy’s “sister” project named voice2json (warning: very much in beta). Recognition with the nnet3 Python extension is about 3x faster over the shell script.
I’m very interested in the GrammarFst approach. I’m thinking it would be best to try and fold this into rhasspy-nlu. Perhaps we could include pre-generated FSTs for different slot types in the language-specific profile downloads?
The GrammarFst seems to be used during ASR decoding also so maybe not only in the NLU?
For Snips the ASR and the NLU share lots of common parts (like the entities and the LM) because they are very closely related. You can read more about their ASR/NLU system here:
That’s what I was thinking to provide builtin slots like datetime, number, etc. The GrammarFst seems to avoid duplicating “common” parts in the LM (which is perfect for slots ).
These “grammar” builtin slots can easily be provided by rules that get compiled as FSTs for each language. I’ve got basic rules for <number>
, <datetime>
, <duration>
, <percent>
and <temperature>
already in my sentences.ini
(in french though but it’ll be easy to provide them in lots of other languages too with help from the community).
It may be possible to use them like so:
wake me up (<$.datetime>){when}
increase the temperature by (<$.temperature>){offset}
Different notations can be considered:
(<datetime>){name}
(<$.datetime>){name}
(@datetime){name}
($datetime){name}
($builtin-datetime){name}
I personaly like the “rule” version (<$.datetime>)
with a $
sign to avoid confusion with local or intent rules.
Plus they can easily match Snips builtin slots if their NLU is used as an option in the future.
What do you think?
I also agree with what has been said above regarding the Discord/Discourse channel (no preference) for Rhasspy.
It’ll be easier and more productive to follow multiple topics
+1
Having forum categories for hardware, apps, language-specific stuff, developer stuff, announcements and so on would be much more welcoming to newcomers and at the same time help the regulars.
Could also use the new voice assistant category and make multiple threads.
https://community.home-assistant.io/c/configuration/voice-assistant
I’ve made a subcategory request in the voice assistant forum.
If all of you like the request, it may show the interest from many to Rhasspy development and make the admin decide to create the dedicated subforum ?
I had a question for you.
I wanted to make an intent that allowed for a variable number. For example “Set a timer for X minutes” (this variable would be passed to a Node-Red automation by its tag). What would I use for the X? Originally I saw that pipes allowed for multiple options, but listing all the numbers would be a big list. I see there is a new feature “slots” and from what I read in the docs this might work, but I think would still require creating the list.
Is there another way, or would one of these be the best option?
Thanks!
DeadEnd
damn @synesthesiam just got round to installing and up and running in like 10 mins.
Docker server (NUC)
Docker client (pi 3B with Seeed Mic Array)
(Using Node-Red)
Honestly I spent too much time reading before I jumped in! Awesome job
Write up here
Couple of questions for the seasoned users:
- How do you break out the split of responsibility? (Wake, voice detection and mic on pi) (Speech to Text & Intent on Server?). Figured it may be easier to process the Speech to Text locally and send the result?
- How do people handle switches that are lights?
- I have the same issue as here based on “Jarvis”
- Anyone fancy sharing some of their sentences for inspiration?
Regarding 2:
Depending on your setup you can call the service homeassistant
which takes whole entity ids like switch.light_outside
.
I have a basic setup I hope to expand in the future.
Currently everything is run on a min-ATX server (it is on the opposite side of a wall from the main living area, so I ran a USB through the wall for a speakerphone).
I disabled MQTT and Home Assistant intent handling since I use Node-Red for all automation. I thought I would share since you said you are using Node-Red as well.
I setup a web-socket listener and then a switch node that filters based on the intent. From this you can use tagged variables that are passed - I use a switch node to change names if necessary. Then you can call Home Assistant services, or whatever else you need.
Farther up on this forum I have my first working example explained in short details.
Cheers!
DeadEnd
Until Rhasspy supports builtin grammar based entities (like datetime
, number
, etc) you can look at the timer recipe from voice2json for inspiration.
Cheers
Awesome!
Thanks for the link!
Quick question… can someone explain why they broke out one to fifty nine vs two to fifty nine?
in the minute/second expressions, couldn’t they have just dropped the ((one:1){minutes})
or second and used the one to fifty nine instead?
… lastly, for anyone who uses this timer design… how long did training take? 8 million possibilities is a lot - but my system is taking FOREVER to do the training. It is a mini-ATX, not a Pi… didn’t expect it to take more than a few minutes.
Thanks, I had a feeling it was likely best to have “light” in Rhasspy and based on entity convert that to “switch” based on the entity being called through NR.
Alexa has a pretty graceful way of doing it by surfacing the entity config (essentially allows the user to surface a switch as a light).
Are you using the fsticuffs
intent recognizer or fuzzywuzzy
? The fsticuffs
training shouldn’t take very long, since it doesn’t generate all possible sentences up front (like all the others have to).
Good Catch.
I was using fuzzywuzzy - I’ll change to fsticuffs.
I think I was using fuzzywuzzy as fsticuffs was having issues getting the wrong intent.
I’ll see how it behaves with fsticuffs and this larger sentence pool.
… rhasspy seemed to be locking up so I restarted the container… after that fsticuffs trained in under a second. I tried going back to fuzzywuzzy and it still wasn’t able to do it - but since the documents outline it is intended for 12-100 sentences, I think this is acceptable… it just isn’t designed for it.
Thanks!
I’m working on improving fsticuffs
in the very near term, so it will (hopefully) do a better job at getting the right intent. Ideally, it will be possible for other people to add some better matching algorithms.
fuzzywuzzy
is very flexible, but doesn’t scale at all since it basically stores every sentence in a big JSON file and has to go through each one every time it processes a sentence.
Thanks for keeping with it!
Hi All,
I’ve just tried install Rhasspy and so far I’ve had no luck, I wonder if anyone could cast their eye over this and suggest a way forward. I’m using a headless (ssh only) Pi Zero W with Respeaker 2-Mics Pi Hat and this is the full command history of this pi (after pi bakery did the wifi/hostname etc…)
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get update
git clone https://github.com/respeaker/seeed-voicecard.git
cd seeed-voicecard
sudo ./install.sh
sudo reboot
# arecord -l successfully shows the hat as a microphone
curl -sSL https://get.docker.com | sh
sudo usermod -a -G docker $USER
sudo reboot
docker run -d -p 12101:12101 \
--restart unless-stopped \
-v "$HOME/.config/rhasspy/profiles:/profiles" \
--device /dev/snd:/dev/snd \
synesthesiam/rhasspy-server:latest \
--user-profiles /profiles \
--profile en
Effectively following the instructions for Respeaker then Rhasspy. My problem is none of the following commands return anything at all
pi@mic:~ $ docker run -it -p 12101:12101 --restart unless-stopped -v "$HOME/.config/rhasspy/profiles:/profiles" -- device /dev/snd:/dev/snd synesthesiam/rhasspy-server:latest --user-profiles /profiles --profile en
pi@mic:~ $ cat < /dev/tcp/127.0.0.1/12101
pi@mic:~ $ ls .config/rhasspy/profiles/
pi@mic:~ $
the docker run seems to silently succeed but there’s nothing running on the port. Nothing created in profiles. Trying from a web browser on another machine I get refused.
Any ideas what I’ve missed? Or what I can run to get more debug info?