any thoughts into using Microsoft Speech-to-text docker containers? Seems to be in preview, and the only thing it queries in the cloud is number of characters?
Anyone tried numbers?
Trying out setting an alarm:
[SetAlarm]
hours = ( 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12) {hours}
minutes = ( 5 | 10 | 20 | 30 | quarter | half) {minutes}
ampm = (A.M. | P.M.) {ampm}
math = ( past | to ) {math}
set alarm for <hours> <minutes>
set alarm for <minutes> <math> <hours>
no matter what I say Rhasspy hears “set alarm for 5 5”
Ah, excellent, thanks. Information overload! Couldn’t see it right in front of me! Thanks for the assist!
For the time being, you’ll need to use the number words in your sentences.ini
:
[SetAlarm]
hours = ( one:1 | two:2 | three:3 | four:4 | five:5 | six:6 | seven:7 | eight:8 | nine:9 | ten:10 | eleven:11 | twelve:12) {hours}
minutes = ( five:5 | ten:10 | twenty:20 | thirty:30 | quarter | half) {minutes}
ampm = (A.M. | P.M.) {ampm}
math = ( past | to ) {math}
set alarm for <hours> <minutes>
set alarm for <minutes> <math> <hours>
I have a fix for this using the Python num2words
package. You’ll find that you can type something into the web interface already like “set alarm for 12 10” and it will work. Just need to hook it up elsewhere…
Rhasspy Announcements
I’ve added a Rhasspy announcements thread where I’ll post information about new versions and features.
You’ll run into the same issue, since all of the C++ libraries/binaries are pre-compiled. Can you open a Github issue for ARMv6 support, please? Thanks!
that works nicely - only issue is its parsed as a string not number (well not really an issue as can clean up in node-red)
Hi @sepia-assistant! I do remember seeing SEPIA a while back. You may have actually been my link over to Zamia Speech!
I’d be happy to collaborate or help in any way I can. One possible way to collaborate may be via a shared protocol, like Hermes or Hermod. You may already have something like this in SEPIA’s websocket server.
For #1, there’s going to be a new intent Home Assistant component in the next release. This has an HTTP endpoint that accepts intent JSON objects directly. I plan to have this as an option in Rhasspy, and it may be an easy path for SEPIA’s Home Assistant Integration.
Rhasspy supports using an external HTTP server for speech-to-text and intent recognition. Maybe I could add a websocket connection out to SEPIA too?
Definitely agree on #2. The rhasspy-nlu library will generate custom language models (with some installed tooling). For grapheme-to-phoneme conversion, I use phonetisaurus. Look for the pre-generated g2p.fst
files in these profiles. The English and German Kaldi profiles are based on Zamia’s IPA dictionary, so they should be compatible with SEPIA out of the box
Thanks for the suggestions, and I’d like to stay in contact (maybe via e-mail, so we don’t flood this mega thread even more )
- Mike
Here is what I have done in a function node to set a msg.delay value when creating a timer:
var hours = Number(msg.slots.hours)||0;
var minutes = Number(msg.slots.minutes)||0;
var seconds = Number(msg.slots.seconds)||0;
msg.delay = (hours*3600 + minutes*60 + seconds)*1000;
return msg;
The next node is a delay node set to allow the override. With this I have created a working timer in Node-Red using Rhasspy voice to “set a timer for XX hours XX minutes XX seconds”.
So far I have only tested it for a few minute, but it seems to be working.
Cheers!
DeadEnd
I put together my custom timer a while back, its based on @synesthesiam example but with a bit more functionality:
I have some other node-red/ha examples i want to put up on the repository but lack the time at the moment, hope i will get a bit more time for this over the holidays.
@synesthesiam
Maybe you would ask gido for web frontend development. He commented, he is a “professional frontend web developer”:
https://forum.snips.ai/t/important-message-regarding-the-snips-console/4145/37
Hey Mike,
One possible way to collaborate may be via a shared protocol, like Hermes or Hermod. You may already have something like this in SEPIA’s websocket server
I’ve not used MQTT protocol yet but the question came up a few times recently (in connection with NodeRed for example). I’ll check out the linked docs to get a better understanding of Hermes and Hermod
For #1, there’s going to be a new intent Home Assistant component in the next release. This has an HTTP endpoint that accepts intent JSON objects directly. I plan to have this as an option in Rhasspy, and it may be an easy path for SEPIA’s Home Assistant Integration.
That sounds very interesting! I should definitely check it out when it’s ready
Rhasspy supports using an external HTTP server for speech-to-text and intent recognition. Maybe I could add a websocket connection out to SEPIA too?
The Websocket connection to the SEPIA STT server is based on an old Microsoft Cloud-Speech demo. I should write a documentation about how to connect to it . Intents can be obtained from SEPIA’s interpret-endpoint (kind of REST API) as well though I have to admint the results lack a bit of “uniformity” maybe a chance to clean up this part a bit ^^.
[…] For grapheme-to-phoneme conversion, I use phonetisaurus. Look for the pre-generated
g2p.fst
files in these profiles. The English and German Kaldi profiles are based on Zamia’s IPA dictionary, so they should be compatible with SEPIA out of the box
That is awesome!
Thanks for the suggestions, and I’d like to stay in contact (maybe via e-mail, so we don’t flood this mega thread even more )
Yes I’ll take some time and check out your links and then come back to you via email
cu,
Florian
I’d be interested in talking to him. Do you already have an account over on that forum? Maybe he could join the discussion here?
I put the switches in a group, that way you can still call them lights.
Hello, everybody,
I’ve been following the forum here since the beginning of this month and changed from Snips to Rhasspy due to the friendly and active community and the good documentation. Thank you for the effort you put into this project.
I especially like the fact that you can adjust the phonemes for speech recognition. In order for Snips to understand the name of our vacuum cleaner robot (James, English pronunciation) I had to enter the word in the Snips console in a rather strange way (tschaims) so that the German language model recognized the word. Which mostly worked rather statistically.
Also the Hotword recognition with porcupine is very reliable - much better than snips with the same hardware (Jabra 510).
But there’s one point I can’t get any further: Very rarely (about 1 out of 30) the speech recognition terminates directly without waiting for audio. Here the wake wav is played and immediately afterwards the recorded wav.
I run Rhasspy in a docker container on a pi4.
Here is an excerpt of the log and my profile when this error occurs. I can only speculate here - but it seems to me that the timeout of webrtcvad comes too early.:
AssertionError: No intent recognized
[DEBUG:1619793] DialogueManager: decoding -> recognizing
[DEBUG:1619791] DialogueManager: (confidence=0)
[DEBUG:1619787] PocketsphinxDecoder: Decoded WAV in 0.036293983459472656 second(s)
[DEBUG:1619749] PocketsphinxDecoder: rate=16000, width=2, channels=1.
[DEBUG:1619749] APlayAudioPlayer: ['aplay', '-q', '/profiles/de/wav/robot_blip_custom2.wav']
[DEBUG:1619748] DialogueManager: awake -> decoding
[DEBUG:1619747] WebrtcvadCommandListener: listening -> loaded
[WARNING:1619747] WebrtcvadCommandListener: Timeout
[DEBUG:1619283] PyAudioRecorder: Recording from microphone (PyAudio, device=None)
[DEBUG:1619175] PorcupineWakeListener: Loaded porcupine (keyword=/profiles/de/porcupine/hey_pico_raspberrypi.ppn). Expecting sample rate=16000, frame length=512
[DEBUG:1619164] PyAudioRecorder: started -> recording
[DEBUG:1619164] PyAudioRecorder: Stopped recording from microphone (PyAudio)
[DEBUG:1619161] PyAudioRecorder: recording -> started
[DEBUG:1619084] PorcupineWakeListener: listening -> started
[DEBUG:1619083] APlayAudioPlayer: ['aplay', '-q', '/profiles/de/wav/robot_blip_custom1.wav']
[DEBUG:1619082] WebrtcvadCommandListener: loaded -> listening
[DEBUG:1619081] WebrtcvadCommandListener: Will timeout in 30 second(s)
[DEBUG:1619079] DialogueManager: asleep -> awake
[DEBUG:1619078] DialogueManager: Awake!
[DEBUG:1619076] PorcupineWakeListener: Hotword detected (True)
[INFO:1612770] quart.serving: 127.0.0.1:56278 GET / 1.1 200 1029 8200
{
"handle": {
"system": "hass"
},
"home_assistant": {
"access_token": "yes",
"url": "IP:PORT"
},
"mqtt": {
"enabled": true,
"host": "IP:PORT",
"password": "yes",
"site_id": "yes",
"username": "yes"
},
"sounds": {
"recorded": "${RHASSPY_PROFILE_DIR}/wav/robot_blip_custom2.wav",
"wake": "${RHASSPY_PROFILE_DIR}/wav/robot_blip_custom1.wav"
},
"speech_to_text": {
"pocketsphinx": {
"min_confidence": 0.01
}
},
"text_to_speech": {
"picotts": {
"language": "de-DE"
},
"system": "picotts"
},
"wake": {
"porcupine": {
"keyword_path": "porcupine/hey_pico_raspberrypi.ppn"
},
"system": "porcupine"
}
}
@j3mu5 I have noticed this too (the immediate termination of recording).
I am running in docker on a mini-ATX system - also with a Jabra 510 (this thing is AMAZING!)
Quick tangent on the 510… The speakerphone is in my livingroom mounted up on the wall. I was down a hall 30+ feet away, around a corner at a bedroom doorway… and the wake-word worked! I was amazed that it reached that far! I did have to raise my voice slightly for the sentence to be recognized, but wholy crap was that impressive!
Back on topic - I can confirm that I too occasionally have a occurrence of it starting and then immediately stopping the capture. Unfortunately I have done zero diagnosis or debugging as it is not frequent.
Cheers!
DeadEnd
I believe I have a fix for this. It seems to occur if you try to do two intent recognitions in quick succession. The timeout from the first is affecting the second. Should be fixed in the next update today!
@DeadEnd Yes, I’m also convinced of the Jabras. Sound quality and speech recognition quality are super! I experimented with a used 810. But this one doesn’t bring any real improvement (and my wife thinks it’s too big & ugly) - it has to be sold again.
I’m still struggling with the hotword sensitivity, too high sensitivity leads to repeated triggering in movies & series. That’s annoying because the volume of my sound system is lowered as long as Rhasspy listens (triggered by the payload = started | listening in the topic rhasspy/de/transition/PorcupineWakeListener). At the moment I use the following, which seems to me to be the best compromise:
"keyword_path": "porcupine/hey_pico_raspberrypi.ppn",
"sensitivity": 0.7
Dear @synesthesiam, Rhasspy is already a very friendly roommate who likes to take care of the lights, the volume, the vacuum cleaner and also turns on the PC & TV. Only sometimes he doesn’t listen (timeout!). Thank you for teaching him manners.