Rhasspy offline voice assistant toolkit

synesthesiam · June 28, 2019, 2:04am

I wonder if they’re generating code at runtime and need to mark it as executable?

synesthesiam · June 28, 2019, 2:06am

Sure, no problem

Both of these features should be in the latest version now. Both the wake-up sound and the beep when a voice command is finished recorded are now configurable through the web interface.

Also, POST-ing to /api/text-to-speech?repeat=true will now repeat the last spoken sentence.

synesthesiam · June 28, 2019, 2:16am

I ended up developing the slots lists feature for this purpose instead. This just allows you to reference an external text file from your grammar, but it could be populated from anywhere. The idea was to then write a small service to connect to Home Assistant’s /api/config endpoint and extract the appropriate entity names (maybe in a cron job).

I personally didn’t end up needing to write that service because I only have a few entities I care to talk about, but I did create one for pulling movie and TV show names out of Kodi. They just go into a movies and a shows file in my profile’s slots directory, respectively. Then, I can just have something like this in my sentences.ini file:

[PlayMovie]
play ($movies){movie_name}

[PlayTVShow]
play ($shows){show_name}

There is some post-processing I have to do on the names, of course, like changing “2” to “two” and getting rid of apostrophes, etc.

ntuseracc · June 28, 2019, 10:05am

hmm maybe.
I have to admit that this is the only container that i have (i think) that executes code from a mapped folder (besides Python code).

Thanks for adding the requested features Cant wait to play arround with this!

Btw, is there any way to support your work with little contribution?

UPDATE:

Just updated to the latest Version and i have a problem with my custom wake sound the audio plays fine but there is no intent recognition working.
I am suspecting that this could be because my wake signal is speech as well and that this messes up the intents? could this be the problem?

If this is the problem would it be possible to create a delay in the voice recognition for the duration of the wake signal?

Martso · June 28, 2019, 2:22pm

Ahh, OK. I’ll try setting up some manual commands & slots lists.

Can synonyms be used in slots lists?
Wha’s the recommended way to send HASS replies to commands (like “the garage door is open”) to the rhassphy host? I see rhasspy has an API for posting wavs to play, or text to TTS. Should I be making a HASS rest_command service to post text replies to rhassphy’s API? Does rhasspy have some sort of i-am-a-media-player integration with HASS that I’ve missed?
Porcupine seems to do a good job of waking in my short tests so far, but it seems like the voice recognition frequent misses the first word of commands, such as make/set/turn/what/tell/how. Is this just my mic being of insufficient quality (testing with a Microsoft HD3000 webcam USB mic), or might it be that it’s failing to capture the beginning of the audio? I am waiting until after the wake-word beep.

Thanks for your help!

Romkabouter · June 29, 2019, 7:14am

You can do two things for text-to-speech replies:

Lets your hasio instance speak
send a message to the Rhasspy API with a text payload
both

I have an automation responding to a Rhasspy event. It sends a payload via a rest command and calls the TTS service as well.
The payload via restcommand is used, because that is an sattelties and I want to play audio on that sattelite.
The TTS service plays the audio on the machine HA is installed.

My conf.yaml (I am Dutch):

rest_command:
  rhasspy_speak:
    url: 'http://192.168.43.169:12101/api/text-to-speech'
    method: 'POST'
    payload: '{{ payload }}' 
    content_type: text/plain

My automation:

- id: '1556640424575'
  alias: Rolluiken
  trigger:
  - event_data: {}
    event_type: rhasspy_Covers
    platform: event
  condition: []
  action:
  - data_template:
      payload: Dat is goed, ik {{ trigger.event.data.actiontype }} het {{ trigger.event.data.whichcover
        }} rolluik
    service: rest_command.rhasspy_speak
  - data_template:
      message: Dat is goed, ik {{ trigger.event.data.actiontype }} het {{ trigger.event.data.whichcover
        }} rolluik
    service: tts.google_cloud_say
    entity_id: media_player.test

My media player is actually a mpd, sending audio to snapcast (the server is an addon)

media_player:
  - platform: snapcast
    host: 192.168.43.169
  - platform: mpd
    host: 192.168.43.169
    port: 6601
    name: Test

This way, you can create multiroom TTS messages

synesthesiam · June 29, 2019, 1:41pm

I appreciate the thought very much but people volunteering their time is a great contribution already!

Do you see anything in the logs? I’ve never tried anything but beeps for the wake sound, so it is possible that it could interfere with the speech recognition process. I’ll work on either adding an optional delay to the speech recognition, or trying to ensure it doesn’t start until after the wake WAV has finished playing.

synesthesiam · June 29, 2019, 1:51pm

Yes! I just tried it and it seems to work. I have something like this in slots/movies:

toy story three:3

and this in sentences.ini:

[PlayMovie]
play ($movies){movie_name}

When I say “play toy story three”, I get back “toy story 3” in the movie_name slot.

I actually have some code that accepts MPD play requests from Home Assistant, and could play them through Rhasspy’s audio player. Would anyone be interested in that feature?

It might be the audio system you’re using to record (PyAudio vs. ALSA directly) as well as the microphone. I re-create the microphone audio stream in between turning off the wake system and turning on the voice command recognition system. Maybe re-opening the device is taking too long? I may be able to solve both this and @ntuseracc’s problem by opening the device while the wake WAV is playing but making sure the audio is ignored until after the WAV has finished…

Martso · June 30, 2019, 1:38am

That sounds useful to me, though not urgent. I can go ahead and try the recommendations @Romkabouter suggested.

I’ve tried with both PyAudio and ALSA directly with the same results. I’ve also tried with both the Microsoft HD3000 and the PS3 Eye, same issue.

FWIW, PyAudio sometimes segfaults (maybe 1 in 10-20 times), but arecord directly seems functional with the caveat of usually missing the first word. In both cases, I’m testing on a RPi 3b+.

Thanks for the examples! I’ll give them a try.

Romkabouter · June 30, 2019, 8:26am

I am suspecting that this could be because my wake signal is speech as well and that this messes up the intents? could this be the problem?

That is why Snips does not listen until a playFinished message is published.

audio is played, when done, message with topicID on playFinished topic
when wakeword feedback is enabled, Snips start listening to voice commands only after that finished message has been published.

Martso · June 30, 2019, 1:13pm

I’ve moved to trying to run rhasspy on my faster Intel server with the RPi 3b+ as a voice satellite via mqtt with @koan’s hermes-audio-server. The response time to commands (speech-to-text) is certainly faster!

I did get hermes-audio-recorder working fine in a docker container. I noticed that the initial word seemed to not be missed much if at all, perhaps because hermes keeps the mic recording the entire time?

On the other hand, I was unable to get hermes-audio-player working in a docker container. It would play the wakeworld acknowledgement sounds sent over mqtt just fine, but as soon as some TTS came it would immediately die with the below errors. No such behavior if run under a venv. Would be nice to have it running under docker also but not sure where to dig on that.

Can hermes-audio-recorder or any of the other commonly used satellite input/output setups (Matrix Voice MQTT Audio Streamer etc) be used with local wakeworld detection? So audio is only sent when the wakeword is detected?

Some other observations:

Sometimes when I say wakeword, rhasspy appears to detect it but never plays the prompt wav nor listens to my further input, despite DEBUG:PorcupineWakeListener:Hotword detected (True) appearing in the log (twice)
hermes-audio-server’s VAD appears to send audio pretty much anytime there’s sound, from typing to music to my mouse sliding on the desk. In mode 3 it sometimes doesn’t for some music, but does for typing. Basically, at least for me, any mode is basically a don’t-send-silence filter only.
Trying to get default audio devices to go the right devices can be a real beast sometimes, especially if a non-default can’t be selected (hermes). When I finally get it working for hermes-audio-recorder and -player in docker (the latter of which on startup says its listening to my HDMI but is actually listening to the 3.5mm anyway), it still doesn’t work for hermes-audio-recorder in venv.

2019-06-30 12:25:01,429 INFO     Received an audio message of length 88 bytes with request id 7883c146-778d-43db-9e10-6e9c802818e4 on site default.
Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2048
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->playback, outParams, self->primeBuffers, hwParamsPlayback, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2722
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2843
Traceback (most recent call last):
  File "/usr/local/bin/hermes-audio-player", line 20, in <module>
    plac.call(main)
  File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 330, in call
    cmd, result = parser.consume(arglist)
  File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/usr/local/bin/hermes-audio-player", line 16, in main
    cli.main(PLAYER, verbose, version, config, daemon)
  File "/usr/local/lib/python3.7/site-packages/hermes_audio_server/cli.py", line 68, in main
    server.start()
  File "/usr/local/lib/python3.7/site-packages/hermes_audio_server/mqtt.py", line 68, in start
    self.mqtt.loop_forever()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1578, in loop_forever
    rc = self.loop(timeout, max_packets)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1072, in loop
    rc = self.loop_read(max_packets)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1374, in loop_read
    rc = self._packet_read()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2071, in _packet_read
    rc = self._packet_handle()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2560, in _packet_handle
    return self._handle_publish()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2759, in _handle_publish
    self._handle_on_message(message)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2902, in _handle_on_message
    callback(self, self._userdata, message)
  File "/usr/local/lib/python3.7/site-packages/hermes_audio_server/player.py", line 76, in on_play_bytes
    output=True)
  File "/usr/local/lib/python3.7/site-packages/pyaudio.py", line 750, in open
    stream = Stream(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pyaudio.py", line 441, in __init__
    self._stream = pa.open(**arguments)
OSError: [Errno -9997] Invalid sample rate

koan · June 30, 2019, 7:04pm

That’s indeed the desirable scenario, but currently Hermes Audio Server is just a ‘dumb’ audio conduit because I’m not comfortable enough at integrating wake word detection. That may change, but don’t count on it in the immediate future.

koan · June 30, 2019, 7:07pm

An audio message of 88 bytes doesn’t sound right, which sound was this? And what was its sample width?

Yes, configurable audio devices are on my TODO list. It will probably be the next feature I’ll implement and it will definitely make it easier to use Hermes Audio Server in more diverse scenarios.

koan · June 30, 2019, 7:46pm

Yes, your observation is correct. I’m just using WebRTCVAD’s modes for the VAD feature and I’m not impressed by its results, but it’s the best VAD system I found and the same one that Rhasspy uses, so I figured it was a safe bet to avoid downstream problems when I’m sending audio to Rhasspy. I added the VAD as an experimental feature mainly to save bandwidth. For that purpose it does the job.

Martso · July 1, 2019, 7:15am

Rhasspy docs say it always tries to return 16 kHz, 16-bit mono audio, so I’d guessing that’s what it generates, though I can see the included wakeword wavs are 44.1 kHz.

Somewhat interestingly, with:

pcm.speaker {
  type plug
  slave {
    pcm "hw:0,0"
  }
}

We have this crash result (two successful wakeword sounds, followed by a crash on a TTS being sent):

2019-07-01 07:05:30,203 INFO     Connected to audio output bcm2835 ALSA: IEC958/HDMI (hw:0,1).  
2019-07-01 07:05:30,221 INFO     Connected to MQTT broker mqtt:1883 with result code 0.
2019-07-01 07:05:30,222 INFO     Subscribed to hermes/audioServer/default/playBytes/+ topic.
2019-07-01 07:05:35,226 INFO     Received an audio message of length 81.98 KiB with request id b0e69643-ddfa-4843-ad62-a3cf37ae57de on site default.
2019-07-01 07:05:35,689 INFO     Finished playing audio message with id b0e69643-ddfa-4843-ad62-a3cf37ae57de on device bcm2835 ALSA: IEC958/HDMI (hw:0,1) on site default.
2019-07-01 07:05:37,082 INFO     Received an audio message of length 117.1 KiB with request id 78bfd886-4e24-454c-ad0c-c62790e2e50b on site default.
2019-07-01 07:05:37,759 INFO     Finished playing audio message with id 78bfd886-4e24-454c-ad0c-c62790e2e50b on device bcm2835 ALSA: IEC958/HDMI (hw:0,1) on site default.
2019-07-01 07:05:47,322 INFO     Received an audio message of length 35.09 KiB with request id bd7053c0-df4f-4f49-bd84-ac1718a3bad1 on site default.
Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2048
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->playback, outParams, self->primeBuffers, hwParamsPlayback, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2722
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2843
Traceback (most recent call last):
  File "/usr/local/bin/hermes-audio-player", line 20, in <module>
    plac.call(main)
  File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 330, in call
    cmd, result = parser.consume(arglist)
  File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/usr/local/bin/hermes-audio-player", line 16, in main
    cli.main(PLAYER, verbose, version, config, daemon)
  File "/usr/local/lib/python3.7/site-packages/hermes_audio_server/cli.py", line 68, in main
    server.start()
  File "/usr/local/lib/python3.7/site-packages/hermes_audio_server/mqtt.py", line 68, in start
    self.mqtt.loop_forever()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1578, in loop_forever
    rc = self.loop(timeout, max_packets)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1072, in loop
    rc = self.loop_read(max_packets)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1374, in loop_read
    rc = self._packet_read()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2071, in _packet_read
    rc = self._packet_handle()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2560, in _packet_handle
    return self._handle_publish()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2759, in _handle_publish
    self._handle_on_message(message)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2902, in _handle_on_message
    callback(self, self._userdata, message)
  File "/usr/local/lib/python3.7/site-packages/hermes_audio_server/player.py", line 76, in on_play_bytes
    output=True)
  File "/usr/local/lib/python3.7/site-packages/pyaudio.py", line 750, in open
    stream = Stream(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pyaudio.py", line 441, in __init__
    self._stream = pa.open(**arguments)
OSError: [Errno -9997] Invalid sample rate

However, after forcing 48000 rate output via asound.conf in the docker container (44100 still crashed) with this asound entry:

pcm.speaker {
  type plug
  slave {
    pcm "hw:0,0"
    rate 48000
  }
}

… it plays successfully!

Curiously though the first log line changes from Connected to audio output bcm2835 ALSA: IEC958/HDMI (hw:0,1). to Connected to audio output default.. The only sound configuration that changed is the addition of the rate 48000. I don’t know if that’s just a bug in display, or related to the problem. The Pi is headless so nothing is connected via HDMI, and I do hear the wakeword sounds over the 3.5mm audio (hw:0,0).

2019-07-01 07:11:03,481 INFO     Connected to audio output default.
2019-07-01 07:11:03,497 INFO     Connected to MQTT broker mqtt:1883 with result code 0.
2019-07-01 07:11:03,499 INFO     Subscribed to hermes/audioServer/default/playBytes/+ topic.
2019-07-01 07:11:13,176 INFO     Received an audio message of length 81.98 KiB with request id 75a604f3-2992-4007-9f0d-88235c7f9f59 on site default.
2019-07-01 07:11:13,628 INFO     Finished playing audio message with id 75a604f3-2992-4007-9f0d-88235c7f9f59 on device default on site default.
2019-07-01 07:11:15,020 INFO     Received an audio message of length 117.1 KiB with request id 770f7bb9-acab-4cfa-b224-b1bb54d403bc on site default.
2019-07-01 07:11:15,668 INFO     Finished playing audio message with id 770f7bb9-acab-4cfa-b224-b1bb54d403bc on device default on site default.
2019-07-01 07:11:19,212 INFO     Received an audio message of length 35.09 KiB with request id 8ea3ed0b-aef3-4236-b0ef-ee4a75ad5813 on site default.
2019-07-01 07:11:20,248 INFO     Finished playing audio message with id 8ea3ed0b-aef3-4236-b0ef-ee4a75ad5813 on device default on site default.

koan · July 1, 2019, 8:40am

Yes, audio playback in Linux is a bit of dark magic But I’m glad it works now. I hadn’t tested Hermes Audio Server yet in Docker.

Romkabouter · July 2, 2019, 2:59pm

You can use the Node-red addon for this, it has a “get entities node”, with which you can create a file with the file node.
So, while I have not tried this, you can at a certain interval get all the entities and create slots for it

Romkabouter · July 3, 2019, 4:00pm

A little demo again, voice controlling the brightness of the Matrix Voice with Rhasspy

This one is a bit better, it features my Hassio screen with the Matrix settings and a Pi Zero as a snapcast client

synesthesiam · July 4, 2019, 2:24am

Is this just for node-red-contrib-home-assistant-websocket or is there something built in?

Romkabouter · July 4, 2019, 5:50am

It is the node-red-contrib-home-assistant-websocket, which is preinstalled in the Hassio node-red addon