Rhasspy offline voice assistant toolkit

Wesley_Roelofs · December 12, 2019, 2:09pm

I put the switches in a group, that way you can still call them lights.

j3mu5 · December 12, 2019, 2:56pm

Hello, everybody,

I’ve been following the forum here since the beginning of this month and changed from Snips to Rhasspy due to the friendly and active community and the good documentation. Thank you for the effort you put into this project.
I especially like the fact that you can adjust the phonemes for speech recognition. In order for Snips to understand the name of our vacuum cleaner robot (James, English pronunciation) I had to enter the word in the Snips console in a rather strange way (tschaims) so that the German language model recognized the word. Which mostly worked rather statistically.
Also the Hotword recognition with porcupine is very reliable - much better than snips with the same hardware (Jabra 510).

But there’s one point I can’t get any further: Very rarely (about 1 out of 30) the speech recognition terminates directly without waiting for audio. Here the wake wav is played and immediately afterwards the recorded wav.
I run Rhasspy in a docker container on a pi4.
Here is an excerpt of the log and my profile when this error occurs. I can only speculate here - but it seems to me that the timeout of webrtcvad comes too early.:

AssertionError: No intent recognized
[DEBUG:1619793] DialogueManager: decoding -> recognizing
[DEBUG:1619791] DialogueManager:  (confidence=0)
[DEBUG:1619787] PocketsphinxDecoder: Decoded WAV in 0.036293983459472656 second(s)
[DEBUG:1619749] PocketsphinxDecoder: rate=16000, width=2, channels=1.
[DEBUG:1619749] APlayAudioPlayer: ['aplay', '-q', '/profiles/de/wav/robot_blip_custom2.wav']
[DEBUG:1619748] DialogueManager: awake -> decoding
[DEBUG:1619747] WebrtcvadCommandListener: listening -> loaded
[WARNING:1619747] WebrtcvadCommandListener: Timeout
[DEBUG:1619283] PyAudioRecorder: Recording from microphone (PyAudio, device=None)
[DEBUG:1619175] PorcupineWakeListener: Loaded porcupine (keyword=/profiles/de/porcupine/hey_pico_raspberrypi.ppn). Expecting sample rate=16000, frame length=512
[DEBUG:1619164] PyAudioRecorder: started -> recording
[DEBUG:1619164] PyAudioRecorder: Stopped recording from microphone (PyAudio)
[DEBUG:1619161] PyAudioRecorder: recording -> started
[DEBUG:1619084] PorcupineWakeListener: listening -> started
[DEBUG:1619083] APlayAudioPlayer: ['aplay', '-q', '/profiles/de/wav/robot_blip_custom1.wav']
[DEBUG:1619082] WebrtcvadCommandListener: loaded -> listening
[DEBUG:1619081] WebrtcvadCommandListener: Will timeout in 30 second(s)
[DEBUG:1619079] DialogueManager: asleep -> awake
[DEBUG:1619078] DialogueManager: Awake!
[DEBUG:1619076] PorcupineWakeListener: Hotword detected (True)
[INFO:1612770] quart.serving: 127.0.0.1:56278 GET / 1.1 200 1029 8200

{
    "handle": {
        "system": "hass"
    },
    "home_assistant": {
        "access_token": "yes",
        "url": "IP:PORT"
    },
    "mqtt": {
        "enabled": true,
        "host": "IP:PORT",
        "password": "yes",
        "site_id": "yes",
        "username": "yes"
    },
    "sounds": {
        "recorded": "${RHASSPY_PROFILE_DIR}/wav/robot_blip_custom2.wav",
        "wake": "${RHASSPY_PROFILE_DIR}/wav/robot_blip_custom1.wav"
    },
    "speech_to_text": {
        "pocketsphinx": {
            "min_confidence": 0.01
        }
    },
    "text_to_speech": {
        "picotts": {
            "language": "de-DE"
        },
        "system": "picotts"
    },
    "wake": {
        "porcupine": {
            "keyword_path": "porcupine/hey_pico_raspberrypi.ppn"
        },
        "system": "porcupine"
    }
}

DeadEnd · December 12, 2019, 3:53pm

@j3mu5 I have noticed this too (the immediate termination of recording).
I am running in docker on a mini-ATX system - also with a Jabra 510 (this thing is AMAZING!)

Quick tangent on the 510… The speakerphone is in my livingroom mounted up on the wall. I was down a hall 30+ feet away, around a corner at a bedroom doorway… and the wake-word worked! I was amazed that it reached that far! I did have to raise my voice slightly for the sentence to be recognized, but wholy crap was that impressive!

Back on topic - I can confirm that I too occasionally have a occurrence of it starting and then immediately stopping the capture. Unfortunately I have done zero diagnosis or debugging as it is not frequent.

Cheers!
DeadEnd

synesthesiam · December 12, 2019, 3:55pm

I believe I have a fix for this. It seems to occur if you try to do two intent recognitions in quick succession. The timeout from the first is affecting the second. Should be fixed in the next update today!

j3mu5 · December 12, 2019, 4:07pm

@DeadEnd Yes, I’m also convinced of the Jabras. Sound quality and speech recognition quality are super! I experimented with a used 810. But this one doesn’t bring any real improvement (and my wife thinks it’s too big & ugly) - it has to be sold again.

I’m still struggling with the hotword sensitivity, too high sensitivity leads to repeated triggering in movies & series. That’s annoying because the volume of my sound system is lowered as long as Rhasspy listens (triggered by the payload = started | listening in the topic rhasspy/de/transition/PorcupineWakeListener). At the moment I use the following, which seems to me to be the best compromise:

            "keyword_path": "porcupine/hey_pico_raspberrypi.ppn",
			"sensitivity": 0.7

j3mu5 · December 12, 2019, 4:12pm

Dear @synesthesiam, Rhasspy is already a very friendly roommate who likes to take care of the lights, the volume, the vacuum cleaner and also turns on the PC & TV. Only sometimes he doesn’t listen (timeout!). Thank you for teaching him manners.

DeadEnd · December 12, 2019, 4:15pm

I am using porcupine too - I didn’t know there was a sensitivity setting… I also get lots of false triggers using Jarvis wake word.

I did see the jabra 710 model and it looks like it has one feature that would be useful… it can link 2 units together (Bluetooth I believe). Depending on the range, this would allow you to expand the reach - but the 510 is pretty good… I would only do this if the two units could link my two floors together… that would give full house listening without needing a satellite unit!

That would be impressive… but for the price (I got the 510 used for ~$40 USD) I don’t think I could get close to that on a 710, let alone two!

DeadEnd

DeadEnd · December 13, 2019, 3:43am

Okay, so I just checked and it looks like there is sensitivity settings for pocketsphinx and snowboy, but not porcupine.

@synesthesiam is this something that is just missing from the GUI or does porcupine work differently and doesn’t have a sensitivity setting?

Thanks!
DeadEnd

synesthesiam · December 13, 2019, 3:54am

It was just missing from the GUI. I have it already fixed in master, along with the ability to have multiple porcupine wakewords. After deploying a broken Docker image last time, I’m taking a bit more time to test this next version

DeadEnd · December 13, 2019, 4:09am

Eh, your docker wasn’t broken per say… one of its dependencies was… and from what I saw it was a relatively important break and was fixed quickly… must have effected many things.

Not complaining at all though, your work on Rhasspy is fantastic!
I am thoroughly happy with how it is working - and I know you are still doing a ton more work!

Thanks!!!
DeadEnd

j3mu5 · December 13, 2019, 12:59pm

I fully agree with @DeadEnd ! The effort @synesthesiam makes is incredible - I am very grateful to you.
It will never be possible to fix all bugs in advance within a reasonable timeframe. In my opinion, it is the community’s job to find bugs that occur rarely or in very unusual cases.

The sensitivity for porcupine can be influenced by editing the profile.json. Here however already missing commata or other typos make problems. A change of settings in the frontend can overwrite the previously made changes in profile.json. Therefore I recommend: Configure either only in the file or only in the frontend.

koan · December 13, 2019, 1:17pm

Indeed, it’s incredible how productive @synesthesiam is, I really don’t understand how he is able to manage such a complex project on his own. I suspect there’s secretly a whole team of programmers behind this nickname

Romkabouter · December 13, 2019, 1:58pm

Tried Hassio Addon 2.4.13 and got this:

Traceback (most recent call last):
  File "app.py", line 14, in <module>
    from quart import (
  File "/usr/local/lib/python3.6/dist-packages/quart/__init__.py", line 4, in <module>
    from .app import Quart
  File "/usr/local/lib/python3.6/dist-packages/quart/app.py", line 16, in <module>
    from .asgi import ASGIHTTPConnection, ASGIWebsocketConnection
  File "/usr/local/lib/python3.6/dist-packages/quart/asgi.py", line 5, in <module>
    from .datastructures import CIMultiDict
  File "/usr/local/lib/python3.6/dist-packages/quart/datastructures.py", line 47, in <module>
    class CIMultiDict(_WerkzeugMultidictMixin, AIOCIMultiDict):  # type: ignore
TypeError: type 'multidict._multidict.CIMultiDict' is not an acceptable base type

anyone else as well?

DeadEnd · December 13, 2019, 2:08pm

Yes, the MultiDict had an update that broke CIMultiDict.
They have since fixed this - and @synesthesiam has pushed a new docker image that uses the version before the bug I believe. Make sure you get the newest image and it should go away.

DeadEnd

Romkabouter · December 13, 2019, 2:09pm

I did a rebuild, and that solved it

cphassistant · December 13, 2019, 3:39pm

Fantastic work you have done here @ synesthesiam

I got Rhasspy installed on Hassio on a Raspberry Pi 3. Are the below processing times to be expected on this hardware?

Example processing time from the Pi:

PocketsphinxDecoder: turn on living room lamp
"time_sec": 4.757205009460449
PocketsphinxDecoder: Decoded WAV in 4.599050521850586 second(s)

thinker · December 13, 2019, 3:47pm

@synesthesiam
Impressive how the development is progressing. Thanks a lot!
I have seen on your github page that in the performance table at picotts only english is given. I in the meantime picotts supports DE, FR, IT, ES as well.

synesthesiam · December 13, 2019, 6:46pm

Nah, but I could stand to lose some weight

synesthesiam · December 13, 2019, 6:47pm

That seems a bit slow. Are you using open transcription or mixed language modeling? If you don’t know what either of those are, probably not

cphassistant · December 13, 2019, 7:54pm

Everything is pretty much vanilla out the box. I’m not using Open transcription mode nor mixed language. I did test on a Synology NAS as well and time was awesome.

I plan to be using Rhasspy on the Synology with a Matrix Voice. So the speed on the Raspberry is not that much of an issue in my case while testing. I just wanted to know if there was anything I could do to optimise.