Thanks for the tips and links! I’ve made a first pass at implementing the Hermes protocol in Rhasspy. Here’s how to use it:
Enable and configure MQTT on the Settings tab (or in your default profile)
Set the wake system to remote MQTT
If all goes well, this should stream the audio data out via MQTT in the format you described above. When the wake word is detected and the hermes/hotword/<WAKEWORD_ID>/detected is sent back, it’ll trigger Rhasspy to listen for a command.
I haven’t added it to the Settings page yet, but you can set the sounds.system to hermes in your profile to have it send out a playBytes message too instead of using aplay.
Everyone, I’ve updated Rhasspy on Docker and my Hass.IO add-on repository. I did a major refactoring of the code base, so I hope I fixed more bugs than I introduced…
Anyway, here are some of the new features:
Support for external wake word detection systems. Based on feedback from @Romkabouter, I’ve based this on the Snips.AI Hermes protocol. I’m working on two Hass.IO add-ons for wake word detection, one based on snowboy, the other on Mycroft Precise.
A command-line interface for building voice-enabled Linux applications on top of Rhasspy (see the rhasspy shell script).
Support for tuning your acoustic model with voice samples (tutorial coming soon)
For most people, I assume the external wake word stuff is the most interesting. It’s more complicated than I would like to use, but I’m doing my best to make it so Rhasspy can interact with other systems. To get it working, you need:
Rhasspy installed somehow
An MQTT server (there’s an official Hass.IO add-on)
I plan on making another video to demonstrate this soon. At the very least, I think a docker-compose example could help. It shouldn’t be too hard to put Rhasspy, an MQTT server, and one of the wake word systems all together…
If you could just say which docker image to use for the wake word section, I can add it to my existing docker-compose file which already has everything else necessary.
Do you have any plans on, or is there a way currently to process network streams as the audio device? Say a bunch of esp32’s outputting their mics as a constant audio stream.
This should work now, actually. There’s a new MQTT option under the microphone settings, which will listen for WAV data inside an MQTT payload as @Romkabouter described above.
Make sure not to also set the wake word system to MQTT, though, as this will try to send/receive network audio at the same time.
From the error, it looks like your Home Assistant URL got erased somehow in the settings. The default value is http://hassio/homeassistant/ if you’re inside Hass.io.
I’ve got a start on a docker compose example with Rhasspy, snowboy, and an MQTT server. To make it work, you need to go into the Rhasspy settings (once it’s all up) and:
Set the wake system to remote MQTT
Enable MQTT and set the host to mosquitto (the docker hostname of the MQTT server)
I have it set to use my personal snowboy model with “okay rhasspy” as the wake word. I’d suggest creating your own personal model and giving it a try!
Hey awesome, thanks for the response. I haven’t had a chance to play around yet so sorry if it’s a dumb question but when you say don’t set the wake word system to mqtt, do you mean I need to do move the wake word portion to the audio device and only send the relevant voice commands to rhasspy via mqtt to be interpreted?
I was hoping essentially to have a bunch of hot mics streaming to rhasspy and have it take care of everything.
I may need to think about this more, but here are my initial thoughts
By default, Rhasspy records from a microphone and, when you set the wake word system to MQTT, it streams this recorded audio data out on the hermes/audioServer/<SITE_ID>/audioFrame topic. When some external system detects the wake word (snowboy, etc.), it publishes to hermes/hotword/<WAKEWORD_ID>/detected. Rhasspy picks that up and switches to listening for a command (from the microphone).
In your case, though, you just want to send audio data to Rhasspy. By setting the audio recording system to MQTT, Rhasspy will not record from a microphone, and will instead take in your audio data (on hermes/audioServer/<SITE_ID>/audioFrame). If you have snowboy listening on the same MQTT server, then it should notify Rhasspy when the wake word is detected on hermes/hotword/<WAKEWORD_ID>/detected. Then, Rhasspy will switch to listening for a command from the incoming MQTT audio stream.
So really, the point is to avoid having Rhasspy both publish out audio data and listen for it at the same time. I may need to make this clearer in the settings.
Hmm, I got an error: OSError: [Errno -9996] Invalid input device (no default output device)
Only a Pi3 and hassio, but with USB microphone, uninstalled and installed again.
Same issue, anyone else?
Ok, i am learning fast, simple automatic is working, i am fighting with wake up words.
I check “Listen for wake word on start-up” and “Use pocketsphinx locally " set a wakeup word as"okay rhasspy” and other simple words but it seems i cannot make it work.
Should I hear a notification sound that rhasspy is listening now like in snips?
After many tries i read about other addons to activate mqtt wake word system but i cannot install them as addon in hassio. I just hitted install, progressing circle is circling around and nothing happend. Doesnt matter is Mycroft or Snowboy.
After a restart i dont see any profiles in right corner (fr,en ect) To make it work, I am doing a few restarts and it shows up, but should it be like this?
@Romkabouter It happpend to me, i just reconnecting phisically devices, restarting arduino or just wait more tiem after boot.
I’ve seen this happen in Hass.IO when it didn’t select an input audio device for me. I had to stop the add-on, select the microphone, save, then start it back up. Kind of irritating.
The Mycroft and Snowboy add-ons can take quite a while to build. Try the Snowboy add-on first, since it’s the one I’ve tested the most. I usually go to the Hass.io System tab, refresh the log, and wait for it to report "Build complete for " for the add-on (sometimes 10-15 minutes on a Pi).
Pocketsphinx should work sometimes with an appropriate microphone and wake word (they recommend 3-4 syllables). I have almost no success with my laptop’s built-in microphone, but ok performance through the PS3 Eye. Make sure you have the proper microphone select in the Hass.io add-on tab for audio input too. Rhasspy will pick the default audio device unless told otherwise.
Also try to switch from PyAudio to arecord for audio recording in the settings. Some folks here have had better luck with that for some reason.