Rhasspy offline voice assistant toolkit

CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS                            PORTS               NAMES
476d0d4178ce        synesthesiam/rhasspy-server:latest   "/run.sh --user-prof…"   22 hours ago        Restarting (139) 47 seconds ago                       recursi

yup, looks like the container is live and well

You are correct. Running uname --m returns armv6l

No, it is restarting - that means it is failing to start and looping.
It should be online or something like that.

Sounds like synesthesiam is taking you down the correct path of questions :slight_smile:

OK, I think I’ll have to build a special version for the Pi Zero. Unfortunately, Ubuntu doesn’t have an official image for anything before armv7, but it looks like Debian itself does.

@synesthesiam Hi Michael,

I just recently discovered your Rhasspy project and it immediately reminded me of my old project ILA voice assistant when I saw the way you’ve integrated Pocketsphinx and trained the language models :smiley: :+1:
ILA eventually evolved into a 100% open-source project called SEPIA open assistant, maybe you’ve seen it, to some degree it’s very similar (and very different at the same time ^_^) to Mycroft.
I have two things on the roadmap for SEPIA and I thought maybe it makes sense to check out if Rhasspy fits in somehow :slightly_smiling_face:. I hope it’s ok to post this here.

Number 1 is Home Assistant support. SEPIA supports openHAB for a while now and I’ve just finished an upgraded interface for smart home HUBs together with a first implementation for FHEM. Since Rhasspy is able to convert user intents into commands for Home Assistant I thought this could be a short-cut for SEPIA to speak to HA’s REST API :grin: by sending intents directly to Rhasspy’s interface. Another possible way is to integrate Rhasspy’s NLU into SEPIA’s customizable NLU chain. SEPIA is written in Java but has a Python “bridge” to offer developers access to all the fancy tools for NLU like spaCy or Rasa.

Number 2 is improved language model adaptation. SEPIA has its own STT server (written in Python) that uses Kaldi (Zamia) and supports English and German. There is a procedure to build a custom language model but it is not very comfortable yet since it requires to put all the training data in a text file and add missing words to the dictionary by hand. I’ve started to simplify this procedure so users can soon export their own custom commands and sentences directly from SEPIA’s teach-server to build the language model but I’m still missing automatic word-to-phonem conversion :grimacing:
Maybe we could combine forces here and build a common STT interface for both system.

I’m just throwing random ideas around here because I thought it would be a waste if these two open source projects don’t benefit from each other somehow :grinning: . It’d be great if you had a closer look at SEPIA and if you see anything interesting for Rhasspy feel free to contact me :slightly_smiling_face:.
In the meantime I wish you good luck with your project!

Florian

3 Likes

Alright, is trying the virtual env viable or will I run into the same issue?

any thoughts into using Microsoft Speech-to-text docker containers? Seems to be in preview, and the only thing it queries in the cloud is number of characters?

Anyone tried numbers?

Trying out setting an alarm:

[SetAlarm]
hours = ( 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12) {hours}
minutes = ( 5 | 10 | 20 | 30 | quarter | half) {minutes}
ampm = (A.M. | P.M.) {ampm}
math = ( past | to ) {math}

set alarm for <hours> <minutes>
set alarm for <minutes> <math> <hours>

no matter what I say Rhasspy hears “set alarm for 5 5”

@fastjacksprt answered this for me a day or two ago:

Cheers!
DeadEnd

1 Like

Ah, excellent, thanks. Information overload! Couldn’t see it right in front of me! Thanks for the assist!

For the time being, you’ll need to use the number words in your sentences.ini:

[SetAlarm]
hours = ( one:1 | two:2 | three:3 | four:4 | five:5 | six:6 | seven:7 | eight:8 | nine:9 | ten:10 | eleven:11 | twelve:12) {hours}
minutes = ( five:5 | ten:10 | twenty:20 | thirty:30 | quarter | half) {minutes}
ampm = (A.M. | P.M.) {ampm}
math = ( past | to ) {math}

set alarm for <hours> <minutes>
set alarm for <minutes> <math> <hours>

I have a fix for this using the Python num2words package. You’ll find that you can type something into the web interface already like “set alarm for 12 10” and it will work. Just need to hook it up elsewhere…

1 Like

Rhasspy Announcements :mega:

I’ve added a Rhasspy announcements thread where I’ll post information about new versions and features.

2 Likes

You’ll run into the same issue, since all of the C++ libraries/binaries are pre-compiled. Can you open a Github issue for ARMv6 support, please? Thanks!

that works nicely - only issue is its parsed as a string not number (well not really an issue as can clean up in node-red)

1 Like

Hi @sepia-assistant! I do remember seeing SEPIA a while back. You may have actually been my link over to Zamia Speech!

I’d be happy to collaborate or help in any way I can. One possible way to collaborate may be via a shared protocol, like Hermes or Hermod. You may already have something like this in SEPIA’s websocket server.

For #1, there’s going to be a new intent Home Assistant component in the next release. This has an HTTP endpoint that accepts intent JSON objects directly. I plan to have this as an option in Rhasspy, and it may be an easy path for SEPIA’s Home Assistant Integration.

Rhasspy supports using an external HTTP server for speech-to-text and intent recognition. Maybe I could add a websocket connection out to SEPIA too?

Definitely agree on #2. The rhasspy-nlu library will generate custom language models (with some installed tooling). For grapheme-to-phoneme conversion, I use phonetisaurus. Look for the pre-generated g2p.fst files in these profiles. The English and German Kaldi profiles are based on Zamia’s IPA dictionary, so they should be compatible with SEPIA out of the box :slight_smile:

Thanks for the suggestions, and I’d like to stay in contact (maybe via e-mail, so we don’t flood this mega thread even more :laughing:)

  • Mike
1 Like

Here is what I have done in a function node to set a msg.delay value when creating a timer:

var hours = Number(msg.slots.hours)||0;
var minutes = Number(msg.slots.minutes)||0;
var seconds = Number(msg.slots.seconds)||0;


msg.delay = (hours*3600 + minutes*60 + seconds)*1000;

return msg;

The next node is a delay node set to allow the override. With this I have created a working timer in Node-Red using Rhasspy voice to “set a timer for XX hours XX minutes XX seconds”.

So far I have only tested it for a few minute, but it seems to be working.

Cheers!
DeadEnd

1 Like

I put together my custom timer a while back, its based on @synesthesiam example but with a bit more functionality:

I have some other node-red/ha examples i want to put up on the repository but lack the time at the moment, hope i will get a bit more time for this over the holidays.

1 Like

@synesthesiam
Maybe you would ask gido for web frontend development. He commented, he is a “professional frontend web developer”:
https://forum.snips.ai/t/important-message-regarding-the-snips-console/4145/37

1 Like

Hey Mike,

One possible way to collaborate may be via a shared protocol, like Hermes or Hermod. You may already have something like this in SEPIA’s websocket server

I’ve not used MQTT protocol yet but the question came up a few times recently (in connection with NodeRed for example). I’ll check out the linked docs to get a better understanding of Hermes and Hermod :+1:

For #1, there’s going to be a new intent Home Assistant component in the next release. This has an HTTP endpoint that accepts intent JSON objects directly. I plan to have this as an option in Rhasspy, and it may be an easy path for SEPIA’s Home Assistant Integration.

That sounds very interesting! I should definitely check it out when it’s ready :slight_smile:

Rhasspy supports using an external HTTP server for speech-to-text and intent recognition. Maybe I could add a websocket connection out to SEPIA too?

The Websocket connection to the SEPIA STT server is based on an old Microsoft Cloud-Speech demo. I should write a documentation about how to connect to it :grin: . Intents can be obtained from SEPIA’s interpret-endpoint (kind of REST API) as well though I have to admint the results lack a bit of “uniformity” :sweat_smile: maybe a chance to clean up this part a bit ^^.

[…] For grapheme-to-phoneme conversion, I use phonetisaurus. Look for the pre-generated g2p.fst files in these profiles. The English and German Kaldi profiles are based on Zamia’s IPA dictionary, so they should be compatible with SEPIA out of the box

That is awesome! :smiley:

Thanks for the suggestions, and I’d like to stay in contact (maybe via e-mail, so we don’t flood this mega thread even more :laughing:)

Yes :sweat_smile: I’ll take some time and check out your links and then come back to you via email :+1:

cu,
Florian

1 Like

I’d be interested in talking to him. Do you already have an account over on that forum? Maybe he could join the discussion here?