Hello,
I’ve been playing around with rhasspy for a couple of days now and it’s starting to work well. Thank you for your hard work! I do however run into an issue when using a wakeword. This issue is not present without a wakeword.
When I speak a sentence to rhasspy after a wakeword, it finds the intent, but it takes another 18 seconds or so before node-red sees the intent on the websocket. Without wakeword, and just pressing the “wake” button, this is almost instantly.
Here is the logging for this particular issue:
rhasspy_1 | 2019-09-15T06:30:03.425177817Z DEBUG:DialogueManager:Awake!
rhasspy_1 | 2019-09-15T06:30:03.425327358Z DEBUG:DialogueManager:asleep -> awake
rhasspy_1 | 2019-09-15T06:30:03.426400818Z DEBUG:APlayAudioPlayer:['aplay', '-q', '/usr/share/rhasspy/etc/wav/beep_hi.wav']
rhasspy_1 | 2019-09-15T06:30:03.432346968Z DEBUG:WebrtcvadCommandListener:loaded -> listening
rhasspy_1 | 2019-09-15T06:30:03.433710247Z DEBUG:SnowboyWakeListener:listening -> loaded
rhasspy_1 | 2019-09-15T06:30:03.434661588Z DEBUG:ARecordAudioRecorder:recording -> started
rhasspy_1 | 2019-09-15T06:30:03.435807724Z DEBUG:ARecordAudioRecorder:Stopped recording from microphone (arecord)
rhasspy_1 | 2019-09-15T06:30:03.435942016Z DEBUG:ARecordAudioRecorder:started -> recording
rhasspy_1 | 2019-09-15T06:30:03.436930852Z DEBUG:ARecordAudioRecorder:['arecord', '-q', '-r', '16000', '-f', 'S16_LE', '-c', '1', '-t', 'raw', '-D', 'default:CARD=CameraB409241']
rhasspy_1 | 2019-09-15T06:30:03.437815606Z DEBUG:ARecordAudioRecorder:Recording from microphone (arecord)
rhasspy_1 | 2019-09-15T06:30:03.999560324Z DEBUG:WebrtcvadCommandListener:Voice command started
rhasspy_1 | 2019-09-15T06:30:05.919244630Z DEBUG:WebrtcvadCommandListener:Voice command finished
rhasspy_1 | 2019-09-15T06:30:05.920433450Z DEBUG:WebrtcvadCommandListener:listening -> loaded
rhasspy_1 | 2019-09-15T06:30:05.921690940Z DEBUG:DialogueManager:awake -> decoding
rhasspy_1 | 2019-09-15T06:30:05.922945645Z DEBUG:PocketsphinxDecoder:rate=16000, width=2, channels=1.
rhasspy_1 | 2019-09-15T06:30:05.926208121Z DEBUG:APlayAudioPlayer:['aplay', '-q', '/usr/share/rhasspy/etc/wav/beep_lo.wav']
rhasspy_1 | 2019-09-15T06:30:05.926376885Z DEBUG:ARecordAudioRecorder:recording -> started
rhasspy_1 | 2019-09-15T06:30:05.927012261Z DEBUG:ARecordAudioRecorder:Stopped recording from microphone (arecord)
rhasspy_1 | 2019-09-15T06:30:06.056042450Z DEBUG:PocketsphinxDecoder:Decoded WAV in 0.13323044776916504 second(s)
rhasspy_1 | 2019-09-15T06:30:06.056980948Z DEBUG:PocketsphinxDecoder:Transcription confidence: 0.14622293698088337
rhasspy_1 | 2019-09-15T06:30:06.057129800Z DEBUG:PocketsphinxDecoder:hoe laat is het
rhasspy_1 | 2019-09-15T06:30:06.057985674Z DEBUG:DialogueManager:hoe laat is het (confidence=0.14622293698088337)
rhasspy_1 | 2019-09-15T06:30:06.058028148Z DEBUG:DialogueManager:decoding -> recognizing
rhasspy_1 | 2019-09-15T06:30:06.059501765Z DEBUG:FsticuffsRecognizer:Got 1 intent(s)
rhasspy_1 | 2019-09-15T06:30:06.059538750Z DEBUG:FsticuffsRecognizer:[{'text': 'hoe laat is het', 'intent': {'name': 'GetTime', 'confidence': 1.0}, 'entities': [], 'raw_text': 'hoe laat is het', 'tokens': ['hoe', 'laat', 'is', 'het'], 'raw_tokens': ['hoe', 'laat', 'is', 'het']}]
rhasspy_1 | 2019-09-15T06:30:06.059751726Z DEBUG:DialogueManager:{'text': 'hoe laat is het', 'intent': {'name': 'GetTime', 'confidence': 1.0}, 'entities': [], 'raw_text': 'hoe laat is het', 'tokens': ['hoe', 'laat', 'is', 'het'], 'raw_tokens': ['hoe', 'laat', 'is', 'het'], 'speech_confidence': 0.14622293698088337}
rhasspy_1 | 2019-09-15T06:30:06.059999644Z DEBUG:DialogueManager:recognizing -> handling
rhasspy_1 | 2019-09-15T06:30:06.060397812Z DEBUG:WebSocketObserver:{"text": "hoe laat is het", "intent": {"name": "GetTime", "confidence": 1.0}, "entities": [], "raw_text": "hoe laat is het", "tokens": ["hoe", "laat", "is", "het"], "raw_tokens": ["hoe", "laat", "is", "het"], "speech_confidence": 0.14622293698088337, "slots": {}}
rhasspy_1 | 2019-09-15T06:30:06.062806451Z DEBUG:DialogueManager:handling -> ready
rhasspy_1 | 2019-09-15T06:30:06.062869805Z INFO:DialogueManager:Automatically listening for wake word
rhasspy_1 | 2019-09-15T06:30:06.062885158Z DEBUG:DialogueManager:ready -> asleep
rhasspy_1 | 2019-09-15T06:30:06.062895802Z DEBUG:SnowboyWakeListener:loaded -> listening
rhasspy_1 | 2019-09-15T06:30:06.062906523Z DEBUG:ARecordAudioRecorder:started -> recording
rhasspy_1 | 2019-09-15T06:30:06.062917657Z DEBUG:ARecordAudioRecorder:['arecord', '-q', '-r', '16000', '-f', 'S16_LE', '-c', '1', '-t', 'raw', '-D', 'default:CARD=CameraB409241']
rhasspy_1 | 2019-09-15T06:30:06.062931434Z DEBUG:ARecordAudioRecorder:Recording from microphone (arecord)
rhasspy_1 | 2019-09-15T06:30:24.349774208Z DEBUG:EspeakSentenceSpeaker:['espeak', '-v', 'nl', '--stdout', 'Het is 6:30 AM']
rhasspy_1 | 2019-09-15T06:30:24.375711496Z DEBUG:EspeakSentenceSpeaker:ready -> speaking
rhasspy_1 | 2019-09-15T06:30:24.376597943Z DEBUG:APlayAudioPlayer:['aplay', '-q']
rhasspy_1 | 2019-09-15T06:30:26.105522676Z DEBUG:EspeakSentenceSpeaker:speaking -> ready
Here you can see it wakes up, listens, finds the intent(at 06:30:06.060397812), and starts to listen for another wakeword. Then after 18 seconds (at 06:30:24.349774208) it receives te espeak command from node-red. The delay is NOT in node-red, i’ve monitored the node-red logging, the websocket command is received in node-red after those 18 seconds.
Can anybody shine some light on this issue?
Thank you very much!
PS I used multiple wakeword handlers and they seem to work fine. If i switch from using a wakeword to not using a wakeword without changes other settings, the problem is gone.
Edit: Added logging when not using wakeword:
rhasspy_1 | 2019-09-15T06:49:53.622047396Z DEBUG:DialogueManager:asleep -> awake
rhasspy_1 | 2019-09-15T06:49:53.622891593Z DEBUG:WebrtcvadCommandListener:loaded -> listening
rhasspy_1 | 2019-09-15T06:49:53.623424212Z DEBUG:APlayAudioPlayer:['aplay', '-q', '/usr/share/rhasspy/etc/wav/beep_hi.wav']
rhasspy_1 | 2019-09-15T06:49:53.631411021Z DEBUG:ARecordAudioRecorder:started -> recording
rhasspy_1 | 2019-09-15T06:49:53.631725979Z DEBUG:ARecordAudioRecorder:['arecord', '-q', '-r', '16000', '-f', 'S16_LE', '-c', '1', '-t', 'raw', '-D', 'default:CARD=CameraB409241']
rhasspy_1 | 2019-09-15T06:49:53.643164143Z DEBUG:ARecordAudioRecorder:Recording from microphone (arecord)
rhasspy_1 | 2019-09-15T06:49:54.178782341Z DEBUG:WebrtcvadCommandListener:Voice command started
rhasspy_1 | 2019-09-15T06:49:56.287289036Z DEBUG:WebrtcvadCommandListener:Voice command finished
rhasspy_1 | 2019-09-15T06:49:56.287641431Z DEBUG:WebrtcvadCommandListener:listening -> loaded
rhasspy_1 | 2019-09-15T06:49:56.293841469Z DEBUG:DialogueManager:awake -> decoding
rhasspy_1 | 2019-09-15T06:49:56.293900786Z DEBUG:APlayAudioPlayer:['aplay', '-q', '/usr/share/rhasspy/etc/wav/beep_lo.wav']
rhasspy_1 | 2019-09-15T06:49:56.301341777Z DEBUG:PocketsphinxDecoder:rate=16000, width=2, channels=1.
rhasspy_1 | 2019-09-15T06:49:56.303845713Z DEBUG:ARecordAudioRecorder:recording -> started
rhasspy_1 | 2019-09-15T06:49:56.306549867Z DEBUG:ARecordAudioRecorder:Stopped recording from microphone (arecord)
rhasspy_1 | 2019-09-15T06:49:56.565217959Z DEBUG:PocketsphinxDecoder:Decoded WAV in 0.2633223533630371 second(s)
rhasspy_1 | 2019-09-15T06:49:56.566176554Z DEBUG:PocketsphinxDecoder:Transcription confidence: 0.5165017610925368
rhasspy_1 | 2019-09-15T06:49:56.570271469Z DEBUG:PocketsphinxDecoder:hoe laat is het
rhasspy_1 | 2019-09-15T06:49:56.570313421Z DEBUG:DialogueManager:hoe laat is het (confidence=0.5165017610925368)
rhasspy_1 | 2019-09-15T06:49:56.570328522Z DEBUG:DialogueManager:decoding -> recognizing
rhasspy_1 | 2019-09-15T06:49:56.570337977Z DEBUG:FsticuffsRecognizer:Got 1 intent(s)
rhasspy_1 | 2019-09-15T06:49:56.570345430Z DEBUG:FsticuffsRecognizer:[{'text': 'hoe laat is het', 'intent': {'name': 'GetTime', 'confidence': 1.0}, 'entities': [], 'raw_text': 'hoe laat is het', 'tokens': ['hoe', 'laat', 'is', 'het'], 'raw_tokens': ['hoe', 'laat', 'is', 'het']}]
rhasspy_1 | 2019-09-15T06:49:56.570353725Z DEBUG:DialogueManager:{'text': 'hoe laat is het', 'intent': {'name': 'GetTime', 'confidence': 1.0}, 'entities': [], 'raw_text': 'hoe laat is het', 'tokens': ['hoe', 'laat', 'is', 'het'], 'raw_tokens': ['hoe', 'laat', 'is', 'het'], 'speech_confidence': 0.5165017610925368}
rhasspy_1 | 2019-09-15T06:49:56.570363174Z DEBUG:DialogueManager:recognizing -> handling
rhasspy_1 | 2019-09-15T06:49:56.570374589Z DEBUG:DialogueManager:handling -> ready
rhasspy_1 | 2019-09-15T06:49:56.570385665Z INFO:DialogueManager:Automatically listening for wake word
rhasspy_1 | 2019-09-15T06:49:56.570394445Z DEBUG:DialogueManager:ready -> asleep
rhasspy_1 | 2019-09-15T06:49:56.571227043Z DEBUG:WebSocketObserver:{"text": "hoe laat is het", "intent": {"name": "GetTime", "confidence": 1.0}, "entities": [], "raw_text": "hoe laat is het", "tokens": ["hoe", "laat", "is", "het"], "raw_tokens": ["hoe", "laat", "is", "het"], "speech_confidence": 0.5165017610925368, "slots": {}}
rhasspy_1 | 2019-09-15T06:49:56.592745005Z DEBUG:EspeakSentenceSpeaker:['espeak', '-v', 'nl', '--stdout', 'Het is 6:49 AM']
rhasspy_1 | 2019-09-15T06:49:56.626688872Z DEBUG:EspeakSentenceSpeaker:ready -> speaking
rhasspy_1 | 2019-09-15T06:49:57.026023920Z DEBUG:APlayAudioPlayer:['aplay', '-q']
rhasspy_1 | 2019-09-15T06:49:59.206410336Z DEBUG:EspeakSentenceSpeaker:speaking -> ready