Voice controlled "house bot" project called Ira

Soon I will get this delivered: https://www.indiegogo.com/projects/matrix-voice-open-source-voice-platform-for-all/x/11330475#/

I want to implement that to create a voice assistant

Actually I am also awaiting delivery of that very same one instead of respeaker.

respeaker works pretty well for me at this point. Snips will be adding custom hotwords “soon:” now.

I’m waiting for delivery of one of those too. Let me know it works with Snips!

What library does the voice recognition? You recognize just the keyword locally and then send the wav to Google?

How do you implement integration with Google Assistant?

I started with the speech_recognition Python module (https://github.com/Uberi/speech_recognition). It uses a running weighted “audio energy” average to determine when to stop recording (or simply does it after a timeout), and then does speech recognition on the resulting wav.

For me, the energy method was too unstable and the delay from wakeword detection to recording to recognition to response was too long. So I created a continuous recording system that feeds audio buffers to wakeword detection, and then after a hit feeds the subsequent buffers directly into the Google “streaming speech” API (https://cloud.google.com/speech/docs/streaming-recognize#speech-streaming-recognize-python). As soon as that API returns responses, they are sent for NLP.

I found that streaming speech recognition tightens the response loop quite a bit. In the future, I’d like to adapt it to work with other speech recognition providers like CMU Sphinx or the thing that Mozilla is working on (https://github.com/mozilla/DeepSpeech).

For Google Assistant integration, I do my own speech recognition “Ira, ask Google XXXX” and then forward the part after “ask Google” to Google’s assistant API. The API accepts audio in and returns audio out, so I run TTS to synthesize a query, and then directly play Google’s audio output.

I also have the assistant speak the query, and then say thanks to Google, giving the illusion of a conversation between the assistants :slight_smile:

It should be easy enough to integrate Alexa in a similar manner.

Using Google’s assistant API in this manner does have one disadvantage however, which is that complicated features like music playback and timers are not possible. It’s just simple questions and answers.

@tschmidty Where did you see the promise of custom hotwords from Snips as they told me it requires thousands of hours of training at a cost :disappointed:

They have posted on their discord chat. It is also quite possible to use a custom hotword detector. A guy here has it set up with a custom hotword as well as ‘person speaking’ capabilities.
https://github.com/oziee/hotword

I created a custom hotword addon: Snowboy.

Read under Snowboy Hotword

I’m down for this, you have everything in a repo yet?

Are you ready to share the project?