I just stumbled upon this. Works with HA and has a wake word system.
There was a thread on Hacker News about it yesterday, which mentions the author plans to post in these forums in due course.
There is a short video on YouTube demoing against Alexa.
I’ve taken a punt and ordered one of these from AliExpress…
Thanks for the AliExpress link, I hadn’t thought to look there. I was expecting it to be a case of having to build/3d print the thing… Needless to say, I just ordered one too.
It seems very promising and I can use without updating Home Assistant, nice…
Available from The Pi-Hut in the UK.
Hi everyone, I’m the creator of Willow and would be happy to answer any questions you may have!
Hi, project looks impressive but I’m confused about wake words.
In the readme on your github you say:
‘Current supported features include:
- Wake Word Engine. Say “Hi ESP” or “Alexa” (user configurable) and start talking!’
but later on in “The Future” section you say
‘Custom Wake Word
Espressif has a wake word customization service that enables us (or you!) to create custom wake words. We plan to create a “Hi Willow” or similar wake word and potentially others depending on input from the community.’
Could you explain the difference between “user configurable” and “custom wake” words?
Ars Technica just did a piece on this Willow could be the $50 hardware piece of the DIY voice assistant puzzle | Ars Technica
Adafruit has some in stock: Search Results for 'esp32-s3 box' on Adafruit Industries
The regular box (with stand) is out of stock, but the mini is in stock! there were 19 when I ordered 1 just now and now they’re down to 14.
Sure! We currently support two wake words (user selectable) - “Alexa”, and “Hi ESP”. We default to Hi ESP because we understand there are likely users that have existing Alexa devices and we don’t want Willow and Alexa stepping on each other. For users that don’t have Alexa devices and want the Alexa wake word, they can select that.
For wake word customization, an often misunderstood aspect about true commercial quality wake word detection is how difficult and involved the process is. The wake word library we use is actually tested and certified by Amazon (of all people) as an Alexa platform device. With that, the requirements for wake word customization from Espressif (ESP BOX/ESP SR people) are VERY involved as you can see from the link you included.
The starting point is basically getting at least 500 people in a recording studio, plus all of the requirements from there! Wake words are interesting things that are actually quite difficult. You need to be able to reliably trigger (or users get frustrated) no matter who the speaker is, and no matter what kind of environment it is (background noise, TV playing, etc). BUT you also need to make sure you’re not going to annoy the user or confuse your solution with so-called “false activation” where wake gets triggered unintentionally.
This precise, expensive, and time-consuming process is why so many open source solutions attempting wake often offer “wake word” that’s drastically inferior to commercial solutions and borderline useless. It’s also why users who try to make their own end up frustrated and give up - even if you provide a bunch of samples of your own voice your wake ends up being less reliable than a professionally trained one even if it’s just you. There’s an entire field of research around this and basically what it boils down to is people (even the same person) have a very wide range of how we pronounce words even if we’re just repeating them ourselves. Factor in being sick, tired, emotional state, etc, etc and threading the needle between reliable wake while eliminating false wake gets harder and harder.
Willow will absolutely create a “Hi Willow” or similar, but that’s mostly just for branding purposes. I say “make your own” because Willow is also intended for commercial users who will pay for the expensive wake customization process because branding and control over wake is of vital importance to them.
I will update the documentation to make it clearer that custom wake isn’t intended for end users and is a commercial thing.
Thank you for what you have built so far! I’m excited for this and hope you can put up a https://www.buymeacoffee.com/ once the project gets a bit further.
I just ordered an ESP32-S3-Lite (all the regulars are out of stock, probably thanks to your project).
@kristiankielhofner thank you for your expansive answers on the ArsTechnica article comments here: Willow is a faster, self-hosted DIY voice assistant built on $50 gadgets | Ars OpenForum
My pleasure really excited to see the interest!
Hah, we’ve been seeing that. Espressif will crank up manufacturing to keep them in stock.
The Lites are going too. One thing to know - the Lite isn’t technically the same board so you will need to select a menu option for it in the configuration. We’ll document that.
Looking forward to hearing your feedback when you receive it!
I would be interested to know how you rate the wake word abilities of rhasspy, given that the rhasspy developer is now leading the HA voice assistant development.
That’s where I first read about it. I should have linked to the article
I’ve actually never used it, so I don’t know. I do have a spare Raspberry Pi around so I may try it at some point.
I have been asked about Rhasspy and other voice efforts of the Home Assistant team quite a bit. What I do know for certain is that we are not “competitors” or anything of the sort (at least in my mind) even if from a surface glance it may appear that way.
As I told Paulus on Hacker News, the goal of Willow is not to be a Home Assistant Voice Assistant. The goal of Willow is to be the best voice/speech/audio interface in the world to a variety of platforms, one of which happens to be Home Assistant.
The Home Assistant team has worked very hard to make a very tight, highly integrated ecosystem. I suspect they will always have certain ecosystem advantages with this approach as furthering the overall ecosystem (now with voice) is their underlying goal.
It’s a subtle but important difference.
Rhasspy is not just one wake word feaure.
It is a framework where multiple different wake word system can be swapped in, so it is a bit more complicated to compare.
Yes I looked at the docs again after I posted and rhasspy leverages a number of projects for wake word. I personally want to be able to give my voice assistant a quite abusive name, “OK xxxxx-er turn the lights on”.
ESP Box - the hardware part of Willow - has arrived. Played for an hour or so. Mixed impressions so far.
First, it works out of the box, which is pretty impressive. Wake word recognition in the demo is quick and accurate (at least as good as Alexa is my initial feeling, and without the cloud delay, though I haven’t done any systematic testing). Shame about “Hi ESP”, but there’s a Chinese wake word which I may try if I can figure out how to pronounce it.
Touch screen is crisp and responsive. The display gives good feedback on wake word detected and command received.
On the downside, I have no idea how to connect the ESP Box to my network. The (very small print) instructions show slightly different screen shots to the ones I am getting and although there is a cog wheel icon on the screen, which I assume leads to settings, it is completely unresponsive.
I have downloaded the Android app, which I understand should connect via Bluetooth, but it doesn’t find the box. I have connected to the wireless interface on the Box, but it doesn’t seem to provide any settings - just example icons for toggling switches.
So now I’m taking a break before digging into the Willow installation guides.
A couple of trivial points. When it arrived there was a screw rattling round in the box. It was obvious where it should go, but not a good look. Allen key screw too - hate them.
If you’re concerned about style, the Box looks well designed but a little cheap (not that cheap is a bad thing). I think this is because of the casing, which is made of that slightly translucent white plastic.
I have used Rhasspy. The wake word can be customised very easily, but my experience was that detection was a bit hit and miss - pretty good, but missing just often enough to be annoying. ESP Box seems far better.