Rhasspy offline voice assistant toolkit

synesthesiam · September 24, 2019, 1:32am

Thanks, good catch! I’ll roll this into the next update.

synesthesiam · September 24, 2019, 1:40am

Oddly, no. Even the arecord command on a PulseAudio system actually calls into PulseAudio instead of ALSA. But when ALSA is the only thing present (like inside the Docker container), it’s actually ALSA, which seems broken with respect to the microphone volume.

What’s absolutely infuriating about PulseAudio with Docker is that you have to play games with fake users insides the Docker container in order to give it access to PulseAudio. It works, but it’s much more complicated to get right than just granting access to /dev/snd.

synesthesiam · September 24, 2019, 1:42am

If it’s something to do with your profile, you could always make a copy and then delete the original. If you’re on Hass.io, it will be in /share/rhasspy/profiles.

FunkyBoT · September 24, 2019, 8:27pm

I deleted the pt folder inside “/share/rhasspy/profiles” but the issue remains. Did I miss something? Also there is a new version of Rhasspy (2.28) which I can’t get it updated. Other addons are running.

synesthesiam · September 25, 2019, 12:10am

Did you run out of space on your SD card?

FunkyBoT · September 25, 2019, 1:05am

It sounds that you are right. I use a 16Gb card in a Pi3 with hassio. It is basically only running Rhasspy and some other addons, but is not my main HA installation, which runs in another Pi3.

home-assistant_v2.db is only 214Mb.
Since I don’t know how to recover some space, I am asking for assistance somewhere else as this subject is off topic here.
I still believe that a 16Gb card is enough for this purpose.
Thanks for pointing it out.

jrlhass · October 28, 2019, 8:05pm

Hi everyone,

New to this thread. I have a respeaker V2 with a snips-audio-server successfully streaming (all) audio to an MQTT process running on my server. I have Rhasspy successfully running on that server in a docker container. I can subscribe to the intents stream hermes/intent/# and see intents as expected.

If I click “Hold to record” in the web front end, everything works great. What I want (if possible) is to have the audio stream automatically segmented (my understanding is that webrtcvad can do this) and processed continuously and without a wake word. Is this possible? If it’s not possible, is there a recommendation for the easiest way to do it (presumably something running on the V2 that sends wakeup messages to Rhasspy)?

(It’s true that, at this level, my question has nothing to do with HA. But once I get this working, that will change

synesthesiam · October 29, 2019, 6:22pm

Hi James, welcome. Glad you were able to get Rhasspy up and running.

I think what you’re asking for might be possible with a slight modification to Rhasspy: basically keeping it awake all the time and disabling the timeout in the webrtcvad component. Do you have a really quiet environment? I’m worried webrtcvad won’t be up to the task.

Without modification, you might consider using an external tool like adintool and then POST-ing the segmented WAV file to the /api/speech-to-intent HTTP endpoint.

DeadEnd · November 7, 2019, 10:12pm

Quick question / poll…

I am looking at the different options for microphone/speaker to connect to Rhasspy.
What are the different setups people are having success with?

I was looking at a couple of options, and not sure if they all work. What I am wanting is an always listening system for wake word.

One would be a USB mic/speaker combo - so far the main ones I see are jabra 410 like… is there something simpler and cheaper than this? I think the Jabra would listen 100% of the time, but not sure as I have never had one… again is this overkill and is there something simpler…

A second though would be using wireless speaker with mic like a sonos one (or again, cheaper alternatives). Would you be able to set Rhasspy up to a device like this for wake word 100% of the time?

Thanks all!
DeadEnd

koan · November 8, 2019, 7:49pm

Hi @DeadEnd you can find some options here:

https://rhasspy.readthedocs.io/en/latest/hardware/#microphone

DeadEnd · November 8, 2019, 8:49pm

Thanks, I did see that, but was wondering what others have done.
I am on the fence between two options

Running a USB through the wall (server is on the other side of livingroom wall) and using local audio - in which case I am wondering what speaker/mic combo’s would work
Remote audio, but the docs only example seems to be MQTT with Respeaker - I don’t know much about all this so I wasn’t sure if any of the others would be able to do constant listening, for example using the HTTP stream, or the capability of PyAudio and ALSA (if they can use non-local devices). Again I am wondering if something like a Sonos or similar product can be used, and if so - how.

It could be that no one has tried yet… but I have very limited knowledge and wanted to see if anyone could enlighten me on the possibilities. That way I can be informed before making a decision on what hardware to acquire.

It might be best for me to just use a local microphone and speaker on my server until I have everything figured out, and then try using remote audio… since I don’t really know what I’m doing yet .

Thanks!
DeadEnd

koan · November 8, 2019, 9:14pm

If you’re asking about experience with specific setups: I’m the author of hermes-audio-server and I’ve successfully run this on a Raspberry Pi 2 with a ReSpeaker 2-Mics Pi HAT and a basic 8 Ohm 1 W 3" speaker on the JST connection for remote audio using MQTT.

Romkabouter · November 8, 2019, 9:15pm

I use a Matrix Voice esp32 version, running my own MQTT audiostreamer.
Almost done with resampling incoming to be played over output jack or speaker connectors

A usb through the wall would work fine as well

synesthesiam · November 11, 2019, 3:46am

Hi @DeadEnd, welcome! Either @koan’s or @Romkabouter’s suggestions would work great. I’m also putting the finishing touches on a Rhasspy “microphone” that records from a GStreamer pipeline. The default is to use UDP, so you can use the udpsink plugin to stream raw audio data across a network, something like:

gst-launch-1.0 \
    autoaudiosrc ! \
    audioconvert ! \
    audioresample ! \
    audio/x-raw, rate=16000, channels=1, format=S16LE ! \
    udpsink host="RHASSPY_HOST" port="RHASSPY_UDP_PORT"

This can support multicast too, if you know how to use that (I don’t ). The code for this is in master, but I haven’t added it to the Settings page or documentation yet. I’ll post here again once I have.

synesthesiam · November 11, 2019, 3:56am

Just a small unrelated update: I’m making some progress on the Dutch MaryTTS voice (no, I didn’t forget!). I was inspired by this blog post to make use of public domain audio books to train the voice. Unfortunately, not knowing Dutch makes it extra hard to ensure the audio and text are properly aligned. So, I found this pre-aligned Dutch reading of William Tell. I don’t know if this guy’s voice is any good, but I at least have training material!

I’m now trying to get this guy’s MaryTTS Dutch plugin to work with the voice training plugins. If it all comes together, I can create a Docker image with the voice for the Dutch speakers to try out. I tried training an English voice in the same way, and it turned out OK except for some missing sounds that I guess were never in the book. This is why professionally trained voices are done from a “phonetically balanced” text. Anyone here speak Dutch and want to read several hours worth of text?

DeadEnd · November 11, 2019, 4:04am

Thanks for the reply!
I have attached a cheap microphone to start testing.
So far it appears to be working - I have added a websocket listener to Node-Red and using a debug node am seeing the intents coming through.

The issue I have right now is that they are often delayed, up to 1 minute. When I see this and send multiple commands, they all come through at once (in Node-Red). I checked the the log in Portainer (I have Rhasspy in docker container) and I see that the indents appear to be populating quickly so the issue is with the websocket. I don’t know if this is on Rhasspy side or Node-Red.

I’m not experienced with websockets at all, so still trying to understand more so I can figure out where the delay is coming from.

Thanks!
DeadEnd

Romkabouter · November 11, 2019, 8:31am

+1 for that

JurnD · November 11, 2019, 4:14pm

Thats awesome, thank you very much!

DeadEnd · November 11, 2019, 9:29pm

Okay, if anyone has experience using the Websocket listener in Node-Red, I’d would appreciate some troubleshooting yelp.

I haven’t been able to identify anything in logs yet (need to probably increase the level), but it seems that if I restart the Rhasspy container Node-Red connects no problem and stays connected (at least for as long as I’ve watched). BUT once I re-deploy flows in Node-Red, it connects for a short time and then disconnects.

I’ve tested this a few times and in this limited testing - after a container restart of Rhasspy the websocket seems to stay connected, but every time I redeploy in Node-Red, the websocket will only connect momentarily and then disconnects.

Does anyone have experience with this?
I’ll keep working on it, but if someone has experience with it already that would be helpful.

Thanks!
DeadEnd

synesthesiam · November 11, 2019, 9:57pm

I’m working right now on websocket issues. I’ve got a fix for the intent delay, but I can’t seem to stop the websockets from disconnecting randomly. I’ll probably have to just upgrade my whole web server infrastructure here soon.

In the meantime, I’d recommend just using MQTT. You can install mosquitto or run a broker through Home Assistant, and it works so much more reliably than websockets!