Rhasspy offline voice assistant toolkit

jaburges · December 11, 2019, 1:56am

any thoughts into using Microsoft Speech-to-text docker containers? Seems to be in preview, and the only thing it queries in the cloud is number of characters?

jaburges · December 11, 2019, 2:59am

Anyone tried numbers?

Trying out setting an alarm:

[SetAlarm]
hours = ( 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12) {hours}
minutes = ( 5 | 10 | 20 | 30 | quarter | half) {minutes}
ampm = (A.M. | P.M.) {ampm}
math = ( past | to ) {math}

set alarm for <hours> <minutes>
set alarm for <minutes> <math> <hours>

no matter what I say Rhasspy hears “set alarm for 5 5”

DeadEnd · December 11, 2019, 3:59am

@fastjacksprt answered this for me a day or two ago:

Cheers!
DeadEnd

jaburges · December 11, 2019, 4:12am

Ah, excellent, thanks. Information overload! Couldn’t see it right in front of me! Thanks for the assist!

synesthesiam · December 11, 2019, 4:14am

For the time being, you’ll need to use the number words in your sentences.ini:

[SetAlarm]
hours = ( one:1 | two:2 | three:3 | four:4 | five:5 | six:6 | seven:7 | eight:8 | nine:9 | ten:10 | eleven:11 | twelve:12) {hours}
minutes = ( five:5 | ten:10 | twenty:20 | thirty:30 | quarter | half) {minutes}
ampm = (A.M. | P.M.) {ampm}
math = ( past | to ) {math}

set alarm for <hours> <minutes>
set alarm for <minutes> <math> <hours>

I have a fix for this using the Python num2words package. You’ll find that you can type something into the web interface already like “set alarm for 12 10” and it will work. Just need to hook it up elsewhere…

synesthesiam · December 11, 2019, 4:15am

Rhasspy Announcements

I’ve added a Rhasspy announcements thread where I’ll post information about new versions and features.

synesthesiam · December 11, 2019, 4:16am

You’ll run into the same issue, since all of the C++ libraries/binaries are pre-compiled. Can you open a Github issue for ARMv6 support, please? Thanks!

jaburges · December 11, 2019, 4:31am

that works nicely - only issue is its parsed as a string not number (well not really an issue as can clean up in node-red)

synesthesiam · December 11, 2019, 4:32am

Hi @sepia-assistant! I do remember seeing SEPIA a while back. You may have actually been my link over to Zamia Speech!

I’d be happy to collaborate or help in any way I can. One possible way to collaborate may be via a shared protocol, like Hermes or Hermod. You may already have something like this in SEPIA’s websocket server.

For #1, there’s going to be a new intent Home Assistant component in the next release. This has an HTTP endpoint that accepts intent JSON objects directly. I plan to have this as an option in Rhasspy, and it may be an easy path for SEPIA’s Home Assistant Integration.

Rhasspy supports using an external HTTP server for speech-to-text and intent recognition. Maybe I could add a websocket connection out to SEPIA too?

Definitely agree on #2. The rhasspy-nlu library will generate custom language models (with some installed tooling). For grapheme-to-phoneme conversion, I use phonetisaurus. Look for the pre-generated g2p.fst files in these profiles. The English and German Kaldi profiles are based on Zamia’s IPA dictionary, so they should be compatible with SEPIA out of the box

Thanks for the suggestions, and I’d like to stay in contact (maybe via e-mail, so we don’t flood this mega thread even more )

Mike

DeadEnd · December 11, 2019, 4:43am

Here is what I have done in a function node to set a msg.delay value when creating a timer:

var hours = Number(msg.slots.hours)||0;
var minutes = Number(msg.slots.minutes)||0;
var seconds = Number(msg.slots.seconds)||0;


msg.delay = (hours*3600 + minutes*60 + seconds)*1000;

return msg;

The next node is a delay node set to allow the override. With this I have created a working timer in Node-Red using Rhasspy voice to “set a timer for XX hours XX minutes XX seconds”.

So far I have only tested it for a few minute, but it seems to be working.

Cheers!
DeadEnd

ntuseracc · December 11, 2019, 12:11pm

I put together my custom timer a while back, its based on @synesthesiam example but with a bit more functionality:

I have some other node-red/ha examples i want to put up on the repository but lack the time at the moment, hope i will get a bit more time for this over the holidays.

haip · December 11, 2019, 3:21pm

@synesthesiam
Maybe you would ask gido for web frontend development. He commented, he is a “professional frontend web developer”:
https://forum.snips.ai/t/important-message-regarding-the-snips-console/4145/37

sepia-assistant · December 11, 2019, 9:13pm

Hey Mike,

One possible way to collaborate may be via a shared protocol, like Hermes or Hermod. You may already have something like this in SEPIA’s websocket server

I’ve not used MQTT protocol yet but the question came up a few times recently (in connection with NodeRed for example). I’ll check out the linked docs to get a better understanding of Hermes and Hermod

For #1, there’s going to be a new intent Home Assistant component in the next release. This has an HTTP endpoint that accepts intent JSON objects directly. I plan to have this as an option in Rhasspy, and it may be an easy path for SEPIA’s Home Assistant Integration.

That sounds very interesting! I should definitely check it out when it’s ready

Rhasspy supports using an external HTTP server for speech-to-text and intent recognition. Maybe I could add a websocket connection out to SEPIA too?

The Websocket connection to the SEPIA STT server is based on an old Microsoft Cloud-Speech demo. I should write a documentation about how to connect to it . Intents can be obtained from SEPIA’s interpret-endpoint (kind of REST API) as well though I have to admint the results lack a bit of “uniformity” maybe a chance to clean up this part a bit ^^.

[…] For grapheme-to-phoneme conversion, I use phonetisaurus. Look for the pre-generated g2p.fst files in these profiles. The English and German Kaldi profiles are based on Zamia’s IPA dictionary, so they should be compatible with SEPIA out of the box

That is awesome!

Thanks for the suggestions, and I’d like to stay in contact (maybe via e-mail, so we don’t flood this mega thread even more )

Yes I’ll take some time and check out your links and then come back to you via email

cu,
Florian

synesthesiam · December 12, 2019, 4:14am

I’d be interested in talking to him. Do you already have an account over on that forum? Maybe he could join the discussion here?

Wesley_Roelofs · December 12, 2019, 2:09pm

I put the switches in a group, that way you can still call them lights.

j3mu5 · December 12, 2019, 2:56pm

Hello, everybody,

I’ve been following the forum here since the beginning of this month and changed from Snips to Rhasspy due to the friendly and active community and the good documentation. Thank you for the effort you put into this project.
I especially like the fact that you can adjust the phonemes for speech recognition. In order for Snips to understand the name of our vacuum cleaner robot (James, English pronunciation) I had to enter the word in the Snips console in a rather strange way (tschaims) so that the German language model recognized the word. Which mostly worked rather statistically.
Also the Hotword recognition with porcupine is very reliable - much better than snips with the same hardware (Jabra 510).

But there’s one point I can’t get any further: Very rarely (about 1 out of 30) the speech recognition terminates directly without waiting for audio. Here the wake wav is played and immediately afterwards the recorded wav.
I run Rhasspy in a docker container on a pi4.
Here is an excerpt of the log and my profile when this error occurs. I can only speculate here - but it seems to me that the timeout of webrtcvad comes too early.:

AssertionError: No intent recognized
[DEBUG:1619793] DialogueManager: decoding -> recognizing
[DEBUG:1619791] DialogueManager:  (confidence=0)
[DEBUG:1619787] PocketsphinxDecoder: Decoded WAV in 0.036293983459472656 second(s)
[DEBUG:1619749] PocketsphinxDecoder: rate=16000, width=2, channels=1.
[DEBUG:1619749] APlayAudioPlayer: ['aplay', '-q', '/profiles/de/wav/robot_blip_custom2.wav']
[DEBUG:1619748] DialogueManager: awake -> decoding
[DEBUG:1619747] WebrtcvadCommandListener: listening -> loaded
[WARNING:1619747] WebrtcvadCommandListener: Timeout
[DEBUG:1619283] PyAudioRecorder: Recording from microphone (PyAudio, device=None)
[DEBUG:1619175] PorcupineWakeListener: Loaded porcupine (keyword=/profiles/de/porcupine/hey_pico_raspberrypi.ppn). Expecting sample rate=16000, frame length=512
[DEBUG:1619164] PyAudioRecorder: started -> recording
[DEBUG:1619164] PyAudioRecorder: Stopped recording from microphone (PyAudio)
[DEBUG:1619161] PyAudioRecorder: recording -> started
[DEBUG:1619084] PorcupineWakeListener: listening -> started
[DEBUG:1619083] APlayAudioPlayer: ['aplay', '-q', '/profiles/de/wav/robot_blip_custom1.wav']
[DEBUG:1619082] WebrtcvadCommandListener: loaded -> listening
[DEBUG:1619081] WebrtcvadCommandListener: Will timeout in 30 second(s)
[DEBUG:1619079] DialogueManager: asleep -> awake
[DEBUG:1619078] DialogueManager: Awake!
[DEBUG:1619076] PorcupineWakeListener: Hotword detected (True)
[INFO:1612770] quart.serving: 127.0.0.1:56278 GET / 1.1 200 1029 8200

{
    "handle": {
        "system": "hass"
    },
    "home_assistant": {
        "access_token": "yes",
        "url": "IP:PORT"
    },
    "mqtt": {
        "enabled": true,
        "host": "IP:PORT",
        "password": "yes",
        "site_id": "yes",
        "username": "yes"
    },
    "sounds": {
        "recorded": "${RHASSPY_PROFILE_DIR}/wav/robot_blip_custom2.wav",
        "wake": "${RHASSPY_PROFILE_DIR}/wav/robot_blip_custom1.wav"
    },
    "speech_to_text": {
        "pocketsphinx": {
            "min_confidence": 0.01
        }
    },
    "text_to_speech": {
        "picotts": {
            "language": "de-DE"
        },
        "system": "picotts"
    },
    "wake": {
        "porcupine": {
            "keyword_path": "porcupine/hey_pico_raspberrypi.ppn"
        },
        "system": "porcupine"
    }
}

DeadEnd · December 12, 2019, 3:53pm

@j3mu5 I have noticed this too (the immediate termination of recording).
I am running in docker on a mini-ATX system - also with a Jabra 510 (this thing is AMAZING!)

Quick tangent on the 510… The speakerphone is in my livingroom mounted up on the wall. I was down a hall 30+ feet away, around a corner at a bedroom doorway… and the wake-word worked! I was amazed that it reached that far! I did have to raise my voice slightly for the sentence to be recognized, but wholy crap was that impressive!

Back on topic - I can confirm that I too occasionally have a occurrence of it starting and then immediately stopping the capture. Unfortunately I have done zero diagnosis or debugging as it is not frequent.

Cheers!
DeadEnd

synesthesiam · December 12, 2019, 3:55pm

I believe I have a fix for this. It seems to occur if you try to do two intent recognitions in quick succession. The timeout from the first is affecting the second. Should be fixed in the next update today!

j3mu5 · December 12, 2019, 4:07pm

@DeadEnd Yes, I’m also convinced of the Jabras. Sound quality and speech recognition quality are super! I experimented with a used 810. But this one doesn’t bring any real improvement (and my wife thinks it’s too big & ugly) - it has to be sold again.

I’m still struggling with the hotword sensitivity, too high sensitivity leads to repeated triggering in movies & series. That’s annoying because the volume of my sound system is lowered as long as Rhasspy listens (triggered by the payload = started | listening in the topic rhasspy/de/transition/PorcupineWakeListener). At the moment I use the following, which seems to me to be the best compromise:

            "keyword_path": "porcupine/hey_pico_raspberrypi.ppn",
			"sensitivity": 0.7

j3mu5 · December 12, 2019, 4:12pm

Dear @synesthesiam, Rhasspy is already a very friendly roommate who likes to take care of the lights, the volume, the vacuum cleaner and also turns on the PC & TV. Only sometimes he doesn’t listen (timeout!). Thank you for teaching him manners.