Willow Voice Assistant

@kristiankielhofner thank you for your expansive answers on the ArsTechnica article comments here: Willow is a faster, self-hosted DIY voice assistant built on $50 gadgets | Ars OpenForum

My pleasure really excited to see the interest!

Hah, we’ve been seeing that. Espressif will crank up manufacturing to keep them in stock.

The Lites are going too. One thing to know - the Lite isn’t technically the same board so you will need to select a menu option for it in the configuration. We’ll document that.

Looking forward to hearing your feedback when you receive it!

1 Like

I would be interested to know how you rate the wake word abilities of rhasspy, given that the rhasspy developer is now leading the HA voice assistant development.

That’s where I first read about it. I should have linked to the article

I’ve actually never used it, so I don’t know. I do have a spare Raspberry Pi around so I may try it at some point.

I have been asked about Rhasspy and other voice efforts of the Home Assistant team quite a bit. What I do know for certain is that we are not “competitors” or anything of the sort (at least in my mind) even if from a surface glance it may appear that way.

As I told Paulus on Hacker News, the goal of Willow is not to be a Home Assistant Voice Assistant. The goal of Willow is to be the best voice/speech/audio interface in the world to a variety of platforms, one of which happens to be Home Assistant.

The Home Assistant team has worked very hard to make a very tight, highly integrated ecosystem. I suspect they will always have certain ecosystem advantages with this approach as furthering the overall ecosystem (now with voice) is their underlying goal.

It’s a subtle but important difference.


Rhasspy is not just one wake word feaure.
It is a framework where multiple different wake word system can be swapped in, so it is a bit more complicated to compare.

Yes I looked at the docs again after I posted and rhasspy leverages a number of projects for wake word. I personally want to be able to give my voice assistant a quite abusive name, “OK xxxxx-er turn the lights on”.


ESP Box - the hardware part of Willow - has arrived. Played for an hour or so. Mixed impressions so far.

First, it works out of the box, which is pretty impressive. Wake word recognition in the demo is quick and accurate (at least as good as Alexa is my initial feeling, and without the cloud delay, though I haven’t done any systematic testing). Shame about “Hi ESP”, but there’s a Chinese wake word which I may try if I can figure out how to pronounce it.

Touch screen is crisp and responsive. The display gives good feedback on wake word detected and command received.

On the downside, I have no idea how to connect the ESP Box to my network. The (very small print) instructions show slightly different screen shots to the ones I am getting and although there is a cog wheel icon on the screen, which I assume leads to settings, it is completely unresponsive.

I have downloaded the Android app, which I understand should connect via Bluetooth, but it doesn’t find the box. I have connected to the wireless interface on the Box, but it doesn’t seem to provide any settings - just example icons for toggling switches.

So now I’m taking a break before digging into the Willow installation guides.

A couple of trivial points. When it arrived there was a screw rattling round in the box. It was obvious where it should go, but not a good look. Allen key screw too - hate them.

If you’re concerned about style, the Box looks well designed but a little cheap (not that cheap is a bad thing). I think this is because of the casing, which is made of that slightly translucent white plastic.

I have used Rhasspy. The wake word can be customised very easily, but my experience was that detection was a bit hit and miss - pretty good, but missing just often enough to be annoying. ESP Box seems far better.

1 Like

Interested to hear your feedback on the hardware!

A couple points:

The factory demo from Espressif (even latest in their Github) uses a significantly out of date wake and local speech recognition engine and model. We did quite a bit of work to bring the latest and greatest to Willow. if you haven’t done microcontroller programming in C before, NOTHING is easy and tasks like that are borderline monumental. But it’s worth it - the latest model we’ve incorporated is SIGNIFICANTLY better.

We have a hardware guide that talks about the hardware:

My two biggest issues are:

  1. The 3d printed enclosure. The plastic is soft and retention on the screws isn’t great. That’s why you found the screw rattling in the box. Now that ESP Boxes have sold out around the world and Espressif is seeing real sales volume for the first time we anticipate they will move to real injection mold plastic that addresses the translucency and screw retention issue.

  2. That damn power LED. Ohhhh man, I hate that thing. One would think it would be controlled via GPIO but they (for some reason) connected it directly to a 3.3v buck converter coming off the input voltage… With the translucency of the case and full duty the green power LED is bright enough to see from space, and it makes the enclosure look even cheaper. Willow inits the LCD ASAP in the boot process so the user gets nearly immediate feedback it’s on so we don’t need it. I’ve taken to slightly opening the enclosures and snapping the power LED off the PCB. You can also touch the display to wake it up at any time, and the hardware microphone mute button on the top of the enclosure works with a status LED as well.

  3. When you configure Willow you can enter your WiFi credentials and Home Assistant server address and personal access token. So now you are on WiFi and talking to Home Assistant.

Then all you do is flash and talk.

Let me know how it goes, happy to help if you run into any snags! Only issue is I keep getting throttled by the forum and I’m currently being limited to one post per hour. I just enabled discussions on the Willow Github repo as I’m having many of these conversations across the internet and it would be much more helpful for all to move users who have hardware and are getting started with Willow there:


Am I the only one who smells snake oil salesman? Granted weird thing to pitch your boat on, but every response is just a sales pitch for something that’ll be released ‘later’, no reply on any media/website has sounded genuine & is just basically ‘my team is the best…’

If someone asked you if the sky was blue or pink? your response would seemingly be ‘Well my team has worked hard on the problem, we’ve used nasa’s framework, upgraded it to the latest version, tested it and removed the bugs, added a lot of our own code & aim to be the best colour detector out there. You can ignore your eyes, we’ll be better than them. Just ask us and we’ll know’…aka you haven’t answered the basic blue or pink question

It can be pink at sunset :stuck_out_tongue_winking_eye:

I mean, let’s wait and see. A lot of the other people have went and bought one, and you will get to hear their testimonies.

No real reason to be the sceptic before that happens.

1 Like

Or to be septic!

1 Like

If there was any sign of commercial interest maybe, but I just read enthousiasm and dedication. Thanks @kristiankielhofner / team tovera for your effort. I second the feature request for a “buy me a coffee” link.


I order one from Pi-hut as well . Now all sold out :frowning:

1 Like

Willow is a month old and no one outside of my team knew it existed before Monday. As they say “Rome wasn’t built in a day”.

If you were to go back 10 years and look at the first month of Home Assistant (or any other very early project) you would see the exact same types of responses. I know this because I’ve been in open source for over 25 years and I installed the first release of Home Assistant 10 years ago. Needless to say Home Assistant in 2013 wasn’t what it is today…

I’m genuinely curious on your perspective here, what are we “selling”, exactly? We have nothing to do with the manufacture, distribution, or sales of the ESP Box - we don’t make a penny from them. We have no way to accept donations. If I’m a “snakeoil salesman” I’m really bad at actually profiting from it :slight_smile: .


This is a very good point!

I think what people are kind of missing here is we’re not seeing much online about Willow because it uses hardware that was obscure until Monday and people have to buy them. Then they need to be shipped, etc and this imposes a delay compared to something you can install on a Linux box, Raspberry Pi, etc and post of video of online in 10 minutes.

Not everyone everywhere has five hour Amazon delivery. If you actually have an ESP BOX you ordered on Monday in your hands today you probably sprung for some very fast shipping! Many people ordered them from Ali and that’s a two week delivery at best…

That said, here’s an issue and demo video from a user in the UK from yesterday:

Many more to come!

Hi, I’m user who posted that video on github showing it controlling my office lights. Thought i’d add some context, since not many people have got the hardware yet.

I ordered an ESP Box from Pi-Hut in the UK on Monday when I heard about Willow, recieved it 2 days later on Wednesday.

I was expecting it to be a lot of hassle to get working, but was pleasantly suprised - the installation instructions in the readme all worked first time. I’ve programmed a bit of microcontroller stuff, python, C, used docker etc… before, and am prepared to get my hands dirty, but none of that knowledge was really necessary.

The wakeword, which i changed from “Hi ESP” to “Alexa” as part of the config before building, works well. Much better than any of the “plug a mic into a raspberry pi” type of projects i’ve tried before.

I’ve not moved the device from my office to the kitchen, which is where i currently use a google home mini, so not tested in a noisy env or larger space yet, but i’m liking what i’m seeing so far.

As for the speech recognition (the bit after the wakeword detection) – tried the on-device multinet stuff, and it’s impressive for such a low power device, but it’s never going to be as good as shipping the audio to whisper.cpp [Edit: not whisper.cpp per se, see reply below]

When configured to use the inference server – running whisper on some graphics card somewhere by the willow authors i presume – the accuracy is excellent. You can say anything and see the transcription. If it matches something HA understands, it works to control things just fine.

Here’s another demo, see the video description on youtube for more details:

I am excited and optimistic there might finally be a path to decent voice control stuff I can run locally (looking forward to running the inference server myself soon) :muscle:


Hey RJ! One slight tweak so there’s no confusion. We don’t use whisper.cpp, we use a highly optimized Whisper based on ctranslate2 with our own additional performance optimizations.

Everyone can look for the release of the Whisper Inference Server next week to host locally! One caveat, though. Our goal is to best Alexa in every way possible. The fastest Whisper CPU implementation in the world (whisper.cpp) running on the fastest CPU on the market gets bested handily by a $100 used GTX 1060 that’s actually lower power (or even better a Tesla P4 that’s single slot, passively cooled, and uses PICe slot power only with a max of 60 watts). We’re targeting sub one second response times all-in and currently GPU is the only way to do that.

The Willow Inference Server can run on CPU but GPUs are just so fundamentally better suited to tasks like speech recognition the performance and response times are pretty frustrating compared to GPU.

1 Like