Rhasspy offline voice assistant toolkit

Hi @synesthesiam, you’re doing an awesome job with Rhasspy, but I think you could use some help, as more and more users are coming in, especially from Snips. I guess there are a lot of people, like me, that want to contribute more but don’t know well what tasks they could pick up.

I think it’s a good idea to put some issues on GitHub with a description of some basic tasks that should be implemented, enhanced, fixed, tested or documented (all of these aspects are important in my opinion) in Rhasspy.

I especially like the approach of LibreOffice’s Easy Hacks, which has lists of “easy hacks” that newcomers can implement, ordered by skill, difficulty level or topic. It’s a great way to familiarize yourself with the codebase and significantly lowers the barrier to contributors.

So if you would put some issues on GitHub with a label easyhack or something like that, people would see that they should be able to tackle this issue even if they aren’t very experienced with the Rhasspy codebase. It could pull in some new contributors or trigger occasional contributors like me more to dig in because much of Rhasspy’s code (especially the actor model you’re using) is way over my head and I’m still a bit lost :slight_smile:

4 Likes

Well… I tried to reproduce what happened, and it didn’t…
So, I must have done something weird to cause it, but I stopped the container, deleted the config directory, and started the container again, and as expected everything was default.

I really don’t know how I did it… but unless I can reproduce it consider it user error. :man_shrugging:

Thanks!
DeadEnd

1 Like

Thanks, this is something I’d like to address as soon as possible. One of my first steps is splitting out pieces of Rhasspy into separate libraries, so contributors can focus on smaller pieces. For example, I’m almost ready to release the rhasspy-nlu library that encapsulates the sentences.ini parsing and fsticuffs intent recognition.

I’d also like to scale back what’s in the core Rhasspy distribution, and instead make it easy to point Rhasspy at an external service (Home Assistant, HTTP, MQTT, etc) for each function (speech to text, etc.). That will keep the main Docker image slim, and make it clear where the boundaries are.

I understand the actor model can be confusing. I need to pull that out as a separate library, with some actual documentation. Each actor is really just a thread with an “inbox” and a current state. The state determines which message handling function is called in the Python class, and is used so the actor can react differently to messages over time. Really, their just microservices in the same process :slight_smile:

3 Likes

As I think more about it, there are three places I could definitely use some help:

  1. Documentation (tutorials, videos, setups)
    • Rhasspy’s documentation really needs a set of focused tutorials, showing users how to go from nothing to a functioning integration with HA, NodeRED, or Hermes/MQTT.
    • More videos demonstrating working systems would be great, so potential users can decide if it’s worth their time.
    • A collection of users’ current (working) setups (hardware, mics, software, settings)
  2. Web interface
    • The current Rhasspy web interface is a Vue app I put together while learning Vue. I don’t really like doing web development, so anyone is welcome to improve it or make a new one!
  3. Testing
    • I’d like to hear suggestions for ways to automatically test Rhasspy. There are so many variations in languages, CPU architectures, installation methods (Docker, Hass.io, etc.), and settings that I have a hard time not missing something each release.
3 Likes

Hi @synesthesiam first, let me congratulate you on your impressive project.
It’s incredible what you’ve already achieved and how fast development is progressing. :grinning: :+1:
I also think the last steps towards modularization and streamlining the core are right and good. i’ve been watching it for a year, but unfortunately i have to say that for a project of mine i first put on snips. that was a mistake as i saw in the last days.

I tried rhasspy on several test systems and everything works fine under docker. On my target project i have to reduce all overhead so i tried the pyton env install as described.
Unfortunately, I can’t get it to work. For safety I tested it on a raspberry pi 3 with the latest
buster version without modifications and special drivers and it comes to the same error.

git cloning works. download-dependencies also works. at ./create-venv.sh the system installs well for a long time but in the end there are some bugs that I couldn’t figure out and fix. see the screenshot from the SSH console.

Hope so we can get this to work i would like to spend a lot of time on rhasspy and also actively help with testing and extending it.

Hello,

I’m an former user of snips… I have been missing for years that snips made only some parts of it open-source for marketing purpose and took care to keep the rest privately held.

After looking for truly opensource alternatives local voice assistant, I want to give rhasspy a try… and if I can, help to make it grow.
I think you could get motivated developpers coming from snips. Do you think you could drive us ? Would it make sense to fork snips-nlu, train it with the snips data each snips user could export from its snips account, and integrate it to rhasspy ?

2 Likes

welcome @farfade
I’m also such a Snip “victim.” Have you tried Rhasspy?
The installation with Docker on a Raspberry Pi 3 is very easy, fast done and runs impressively good. I think the Rhasspy concept is great. You can choose the best suitable components for all applications. There are several intent recognition systems to choose from. With the standard config (fsticuffs) I achieve amazingly good results. What distinguishes Rhasspy from Snips at the moment is that it does not support any skills directly. But that is depending on the desired application no big problem at all. Fortunately Rhasspy supports the Hermes (MQTT) protocol like snips. This is how you can “add” Skill functionality. Run your own Phyton script (also as a service) and listen to the Hermes protocol, at the desired intent you start an action and also return a response to TTS. If you select PicoTTS in the Rhasspy setting, you will have exactly the same voice like Snips.

2 Likes

Hi @thinker, welcome! Thank you for the encouragement :slight_smile:

I think I’ve tracked down the source of the problem, and it looks like I forgot to update some of the dependencies on Github (so thank you). Please do a git pull, delete rhasspy-tools_armhf.tar.gz from the “download” directory in your rhasspy repo, and re-run create-venv.sh.

I’m also working on some experiment Debian packages for rhasspy, which should really simplify the non-Docker installation. I have one up for amd64, but I’m still working on one for the Raspberry Pi (armhf). I’ll post when I have it uploaded.

1 Like

Thanks @farfade for the shout-out on the Snips forum. I’d welcome motivated developers coming to help with Rhasspy. I think we have a chance here to make something that can help a lot of people.

That’s a good question. I’m definitely motivated to maintain and enhance Rhasspy so it can reach as many people as possible, but I also want to make sure I’m not the only person keeping the project alive. If something happens to me, I want to ensure that someone out there can keep building and releasing versions of Rhasspy. Any thoughts on this are welcome.

It might. I need to understand the tech in snips-nlu better before I can say for sure. I also need to see what’s available when someone exports their Snips data. It should be possible to add snips-nlu as another intent recognition system in Rhasspy. If there’s some way to import your Snips training sentences too, it would save you the trouble of converting them to Rhasspy’s sentences.ini format.

In the longer term, I’d like to break apart Rhasspy into similar functional pieces to Snips, so each piece could be worked on separately. Maybe we (the Rhasspy community) should fork the Hermes protocol and make it our own! I vote we rename it Zoidberg (yes, I know they weren’t talking about that Hermes) :wink:

1 Like

As a non dev, I don’t get a vote, but zoidberg has my vote in any case.

1 Like

I am in the process of this, but my time is rather limited these days.

As to the name of the hermes protocol, I don’t see the need to rename really. It is not patented or anything :wink:

1 Like

Have a look at my Snips apps. Yesterday I exported all my data from the Snips Console and put it in the repositories of my apps. For instance snips-app-what-is-happening/console/en at master · koenvervloesem/snips-app-what-is-happening · GitHub.

1 Like

Have a look at Hermod. I don’t know if it’s still in development, but Steve has put a lof of thought into it.

If you want to build upon Hermes, I want to help, but I prefer to do it component by component, and currently the Rhasspy codebase is not broken apart enough for me to do this.

Moreover, we have to think about the differences between the MQTT API, REST API and other mechanisms: they shouldn’t diverge too much.

This is just the thing I was looking for, thanks! I especially like that the siteId is baked into the MQTT topic, so you don’t have to parse a message just to throw it away.

I was thinking we could do this in a way that makes sense for MQTT, Websockets, and HTTP.

  • For MQTT, everything would go as normal
  • For Websockets, there would be websocket endpoints in the Rhasspy web server for each topic that would send outgoing messages to the client and receive incoming messages
    • Something like /api/hermod/<siteId>/microphone/start
    • An alternative might be a single endpoint that you can subscribe to messages like in Home Assistant
    • Messages would be passed from MQTT to Websocket and back
  • For HTTP, there could be endpoints for relevant message pairs or incoming messages
    • So POST-ing to /api/hermod/<siteId>/nlu/parse will return the JSON payload from hermod/<siteId>/nlu/intent (or fail)
    • POST-ing to something like /api/hermod/<siteId>/microphone/start will inject the message and return immediately

Thoughts?

I was mostly kidding, but I still want to name one thing in my life after Zoidberg :wink: My wife said no to our son, so…

Thank you @thinker for the tips. I’ll try Rhasspy as soon as I’ve got spare time :slight_smile:

One other question for you and @synesthesiam : with snips, I had one master server making all the processing (nlu…) of audio sent by two remote (on my LAN) raspberry running snips-satellite (hotword detection and audio input). I did that because processing on the raspberries with snips was too long for me, and because I think it is an architecture helping to deal with identifying sessions (when two mikes hear the same voice as my house is not so large). The satellites and the main processing unit where integrated with MQTT / Hermes.

Do Rhasspy support such and architecture ? Or how do you do with Rhasspy to manage efficiently two or more remote mikes ?

Came here to ask this - managed to find this:

1 Like

I think I misunderstood my problem. I am french and I thought that the snips opensource part would help better speech-to-text and intent recognition for french. It’s not so clear to me, but it seems that the French recognition is a speech-to-text problem that snips solved with a privately held model built on top of kaldi.

The best free Kaldi ASR models I’ve come across are from zamia-speech. I use their TDNN English and German models in Rhasspy. They don’t have a pre-trained French model yet, but I see they have a few hundred hours of French speech data from various corpora.

If someone could try to get the zamia scripts running and generate a French Kaldi model, I could easily add it to Rhasspy.

Maybe call your wife Zoidberg :rofl:

This guy gets it :smiley: