I’ve been working on a voice-controlled “house bot” for the past few months, which uses Home Assistant to as a backend to read sensors and control the house. It’s also developed in a style similar to Home Assistant, using Python 3 and asyncio as a platform, with YAML configuration files.
The architecture of the bot involves several servers running on one PC or different PCs, communicating via MQTT channels. In my home, a PC in the basement runs the core bot process while Raspberry Pis in different rooms run the microphone process, and my Sonos speakers act as output.
Some of the interesting features include:
Snowboy and Tensorflow-based wakeword detection, with voice-controlled labeling
Google-based speech recognition
Google Assistant integration - “Ira, ask Google what’s the weather like today”
espeak or Bing Speech-based speech synthesis
Syntax tree intent matching using Google or Spacy.io NLP; allows matching complex sentences without hand coding dozens of regex variants.
Really easy to add new intents / skills, just add a .py file.
Would anyone in this community be interested in collaborating with this project? As it stands it currently works pretty well in my home, my kids use it and it’s fun to show off to neighbors.
But before it could become an open source project I’d like to clean up the code (remove my passwords etc), have some documentation, and ensure that it works for at least one other person.
Let me know if you’re interested in collaborating!
Thanks, it would be helpful to have that perspective assuming you are comfortable using a shell, cloning Git repos and editing text files. Let me know if you are interested and if so what is your HA setup like?
Nothing English is coded directly into the architecture - all the input and output text is in configuration files.
Using a different language would simply depend on having a speech recognition provider (Google is the only one supported atm) that understands the language, and NLP support via Spacy.io or Google’s cloud NLP.
My hope is that I can use a Raspberry Pi Zero W in each room connected to a mic that is then backended to a central Snips server that can handle the voice requests from each room. Then the responses would be coded to use the room’s sonos that the original request came from. This would be a cost effective way to implement as well as not having to run extra ethernet or 3.5mm audio.
I’d be very interested to understand what components you used to architect your solution both hardware and software.
Your description matches my setup pretty well. I have a PC running Linux in the basement, and then Raspberry Pi 3s in different rooms. I’m planning to transition to Zero Ws eventually but they seem really hard to buy without a load of unnecessary accessories. I initially used a PS Eye camera for the mic but have transitioned to $5 Kinobo mics which have a smaller form factor and still work just fine.
The Pis do initial wakeword detection and and speech recognition, then pass what is said back to the core server running on the basement PC via MQTT. The server then does NLP, TTS and sends the speech to a linked channel, which in my house means a Sonos speaker near the Pi.
My software isn’t as sophisticated as Snips, there is no fancy frontend or web store, but it is very easy to customize and extend, and it works really well with Home Assistant.
Great project, I am using snips now, but I really wnat to use dutch language.
spaCy seems to be able to do that.
I do not want to use Google or any other online service, I do not want to be dependant on the cloud.
So the speech recogniztion might be a problem
Hmm, if you are aware of an offline Dutch speech recognition package I’d be happy to integrate it. I prefer non-cloud solutions as well but Google’s is unfortunately the highest quality at this time. Of course, I only invoke speech recognition once wakeword detection matches, and only for a limited time (default 10 seconds) or until a match, whichever comes first.
How do you like Snips.ai, does it work well with Home Assistant and have you been able to customize it sufficiently?
Well, I have been using CMUSphinx, but that needs a lot of work with building a vocabulary.
Actually, that is wat Snips does when is says “Training your assistant”.
Snips plays nice with my HA. I have got Hassio running on a NAS and use a Pi3 for snips.
I do not use the Hassio addons, that one is still broken.
The Snips on the Pi3 listens to the local MQTT, my Hassio also listens to the Pi.
I created with the Node Red Addon a flow with a listener to the hermes/asr/startListening topic, which then replies via hermes/dialogueManager/continueSession with “Yes?”
This is because in the new Snips the sound when a hotword is detected is default off…
You can enable it via hermes/feedback/sound/toggleOn, but you have to enable it every reboot…
Then I also have the snips component configured, which can open and close my shutters and other stuff. I am still expanding this.
The Node Red flow also listens to the hermes/intent/Romkabouter:Covers topic and then does an hermes/endSession with text (Closing/Opening covers)
You can also implement skills in Snips, but I like to use Node Red for it.
The only part missing for me is Dutch, I can configure Snips to use Google assistant, but I do not want to
Nice project, I’m fairly far down the road implementing this with snips, appdaemon and hass so probably won’t switch gears at this point. Snips has custom tts now which is nice and other language support is supposed to be coming although you can somewhat hack things together now.
For snips I have added a feature in the next release to allow speech responses directly from hass intent_scripts. This works with the custom intent also multi room capability (speech response gets back to the right raspberry pi).
With another PR for event_data_template I have a simple setup to grab intents and send them directly to app daemon for more complicated intents (“turn the heat up 2 degrees”).
Specifically under automations/jarvis.yaml, and under packages I have the others. The actual intent_scripts are dummies handled by appdaemon, this just avoids hass sending an unknownIntent error.
Cool, thanks for the links to PocketSphinx and Judy. I will integrate PocketSphinx and give it a try. Glad to hear that it’s capable of running on the Pi, otherwise I would have to stream audio to the main server.
I chose “Yes?” as my wakeword response as well. You just have to be careful about it not hearing itself, with audio left in the record buffer
I wasn’t aware of the “Hermes” MQTT protocol, I have my own but it might be interesting to adopt that in order to increase compatibility.
I have created my own version of the snips add-on: https://github.com/Romkabouter/hassio-addons
The build-in has some problems, there is an error in the snips-entrypoint.sh and I have fixed that.
Great work! I have been trying to find a voice based assistant that can use a custom wake word. Snips looked good but it appears that you have to pay to get a custom wake word as they say it is a lot of work.
Well no, with some tinkering you can use snowboy for example.
Snips has implented a hermes protocol and come with a snips-hotword service. Indeed, their service can only work with some default hotwords, but you can create a hotword detector with snowboy and post to the hermes topics just like snips does.
I used snowboy initially for my project, but found its quality to be too low.
By tweaking the gain and sensitivity numbers I was able to get reasonable results in a silent house, but as soon as someone closes a door or bangs a pot or plays some music it would wake up.
To get decent results, I was forced to implement a Tensorflow-based DNN using the wav clips that Snowboy reports as “hits” as a second pass filter, and now it’s working well.
It took hand-classifying about 3000 audio clips before I started to get decent results, but I made the training part of the voice assistant so it’s fairly efficient to label.
You might get better results from Snowboy if you used one of their “universal” models, but I suspect it would still be vulnerable to bangs and other household noises.
@Romkabouter Thanks i will look into Snowboy as I am trying to not depend on any thrid party otherwise may as well go with Amazon.
@wadetb I will take into account your experience with Snowboy but I think using a good filtering microphone array may be the answer. Something like Respeaker.