Espressif Box devices + Willow Multilingual Voice STT & TTS Home Assistant Add-on How To Configure

For those folks lucky enough to have an Espressif Box3 (order one from AliExpress), you can now try Willow with the new Willow Add-on for HA. Note that this is just the Willow Application Server (WAS) which configures the STT service your Box3 uses and it send your spoken word to the Willow team’s cloud hosted, best-efforts Willow Inference Server (WIS) which then spits it back to HA in under 1 second. You do not need to install a local WIS Inference server for this to work.

NOTE: I am not affiliated w/ Willow ( in any way. I’m just an avg user who loves what these guys are doing in OpenSource Voice, and like the vast majority of users on these forums and Discord, I became hyper frustrated working w/ the HA Voice implementation. The slick demos look fantastic. Real-world use isn’t cutting it. Im an advanced, career audio electronics engineer. For several months now, I’ve tried the M5, ESPHome, an ESP32 on a breadboard w/ mics & amp/spkr, etc, etc and the results are terrible. No more.

Willow’s implementation of all this just works. And works really, really well. This should take you 15 minutes to get it working.

I currently run the local WIS server with an nVidia GTX 1070 GPU I bought for $75 on eBay and put it into an ancient Dell Optiplex from 2011 w/ 16GB of RAM running Ubuntu 20.04. It gives me highly accurate results in under 250ms. You probably don’t want to go this route. Yet… so just use this Add-On and test it out.

With the new Add-on, this will work on any HA system, but you do need a Espressif Box3 device. Any version is supported, but the Box3 is the one to get if you don’t have one.

First, make sure your Assistant Pipeline in HA is working. If not, this will never work. There are other guides to show you how to do that. Test it by turning a light on/off:

If you can’t get this basic HA Assist functionality working, stop now and fix it before proceeding. troubleshooting info on that is here ‘’‘Troubleshooting Assist - Home Assistant’‘’

Install the HA Willow add on by going to the add-on store, then adding this repository ( using the 3 dots in the upper right corner of your screen, then select Repositories

Paste the URL into the box and click ADD

Now select the Willow Add-on

Click download, wait a few min until it finishes, then click START. You should see this screen. Click Open Web UI:

You should now get the WAS server config screen. Most options should be filled in for you. Check the address of your HA server and port. Chose the Wake Word you want to use.

In HA, navigate to your username profile in the lower left of the HA interface, scroll all the way to the bottom and select Long Lived Tokens. Click create. Give it a name and copy the token.

Paste the token into the config screen where it says Home Assistant Token.

Click SAVE to save your settings.

Now click on Willow Web Flasher. NOTE: YOU MUST use Chrome or other browser that can access web serial ports so you can flash your Box.

Enter your wifi password. This will be used to flash your Box3. Click Connect and select the port your BOX 3 is plugged into on the PC you are now using. once you are connected to the right port, click Flash Willow. You should see this:

You can keep the log open to see how things work.

Once the Box reboots, it should then boot up,connect to your WiFi, and say “Welcome to Willow” on the screen. You should now be able to say “Hi {WakeWord}, turn {light} switch on”, where {WakeWord} is the word you chose during config and {light} is the name of a KNOWN light entity in HA. Test this out in Assist as I showed earlier if you see problems. You should see the responses and you should now be able to control all HA entities via voice. Accurately, reliably, in multiple languages and from across the room!

Helpful Tips:

ONLY Espressif Box devices work with Willow. This device was designed from the ground up to support voice applications. No, you cannot use regular ESPs, at least not yet.

Join their Discord channel if you have questions.

The only wake words that work are Alexa, Hi ESP, and Hi Lexin. Suffice to say, custom wake words are HIGHLY complex. You will soon see that these wake words work very, very well.

If you like what you see, check out their other stuff. If you want truly local and highly accurate TTS with Auto Correct (another REALLY cool piece of the puzzle), go dust off an old tower PC you can put a GPU into, buy a cheap nVidia GTX1070 on ebay and get rolling.

I have my local WIS server on my Linux box set up to forward unknown things I say to Amazon Alexa using Willow Auto Correct (GitHub - kovrom/willow-autocorrect: Willow Auto Correct). Setting my Box3 wake word to “Alexa” and renaming my Alexa device’s wake word to “Echo” you get this:

“Alexa turn on christmas lights” - Willow tells HA to turn on the lights
“Alexa set living room heater to 68 degrees” - Willow tells HA to set the thermostat
“Alexa, set a timer named Bread Rising for 2 hours” - HA has no clue how to do this so it gets forwarded to Alexa which sets a timer for 2 hrs
“Alexa how many ounces in a pound?” - again, HA has no clue, so Alexa answers.
“Alexa, play rock & roll radio on Pandora?” HA has no easy way to do this (no actual way on HA Supervised) on Pandora or Apple Music, so this gets sent to Amazon to play music.

Its wicked fast, accurate, self learning and auto-correcting. It learns how to do more and more things over time when you also install WAS - Willow AutoCorrect.

What I like (love) about this solution is Amazon no longer has ANY contact with my home devices and has no idea what I’m doing inside my home unless knowing what my named timers are doing and what music I listen to counts. And a a bonus, Amazon Alexa can be MUTED since the commands are sent to it programmatically. No more eavesdropping by those nosy engineers at Amazon. And as time goes on, fewer and fewer commands will be forwarded to Amazon as more intents are created and refined in HA.

If there’s interest in this deeper solution, I’ll write up a guide on how to do it. You’ll need to graduate to a local WIS server to start.


ok, I saw questions in other forums about language support. Whatever is supported in the Whisper engine is supported in Willow.

this is just me screwing around using Google translate in Czech with no changes whatsoever to my install:

I’m saying “Hi ESP” and then clicking play in Google Translate which plays the foreign (to me) language sentence thru my crappy PC speaker. I’m watching the Box3 logs using the Willow WebFlasher.

(09:02:31.392) WILLOW/HASS: sending command to Home Assistant via WebSocket: {
        "end_stage":    "intent",
        "id":   1704211351,
        "input":        {
                "text": "This is a test of the translational capabilities of this great new language engine."
        "start_stage":  "intent",
        "type": "assist_pipeline/run"

or this:

“this is a test of the willow speech engine in a foreign language I could go on forever”

I (08:53:36.212) WILLOW/HASS: sending command to Home Assistant via WebSocket: {
        "end_stage":    "intent",
        "id":   1704210816,
        "input":        {
                "text": "This is a test of a speech motor in a foreign 
language, in which I could continue to infinity."
        "start_stage":  "intent",
        "type": "assist_pipeline/run"

Also in many languages, like British English, Engine=Motor so, technically, its quite accurate :smile:

“testing the translation ability of this cool new speech engine and we could go on forever in multiple different languages”

тестирајући способност превођења овог сјајног новог говорног механизма и могли бисмо да наставимо заувек на више различитих језика. скоро је савршено

“text”: “By testing the ability of translating this great new speaking mechanism, we could continue to speak more languages.”

So, the Whisper engine in Willow automatically translates the spoken language to english to submit it to the Assist pipeline in HA in english. I have an english pipeline set as default in HA, so commands are translated and submitted to HA and executed correctly.

If you want your SPOKEN language to be transcribed as spoken and submitted to HA in that language, then add this to your configuration. For Czech, the identifier is cs:

                "text": "Toto je test řečového motoru v rby v cizím jazyce, ve kterém bych mohl pokračovat donekonečna."
        "start_stage":  "intent",

A near perfect transcription and that is then submitted to the HA Assist pipeline if that’s what you want.

1 Like

@jazzmonger I tried to install this extension, but I got this error

The command ‘/bin/sh -c git clone GitHub - toverainc/willow-application-server: Willow Application Server /app’ returned a non-zero code: 128

@meni123 what HA version and installation type/environment are you using? Are you still using
“Supervised on Debian, The processor is: CPU: Intel N100 4 cores 4 threads up to 3.40GHz 6MB Cac Architecture: amd64 x86_64 The RAM is onboard LPDDR5 8GB 4800MHz”
I assume you want to use Hebrew as your language. I’ll try to find the right code for that.

@nwithan8 Any ideas?

A list of language identifiers is here. For example, Hebrew is “he”

Just append your chosen language code to the end of the WIS server url

That’s exactly my details.
You know everything…
Would love your help

@nwithan8 is the author. I’m tagging him here.

Hmm, interesting. Seems it failed when trying to clone the repo during the Docker build process. Unfortunately, error 128 is a generic error code, and I can’t seem to immediately recreate the issue on my machine. To confirm, what version of the add-on are you using, the latest 0.2.0?

Yes, the latest version 0.2.0

How about in the other direction for TTS? This does not seem to make any difference:

Where did you get that info @jazzmonger ?

1 Like

The devs monitor the willow Discord channel and are very helpful. I posted the question there.


@jazzmonger. Which channel? I only see 3 text channels, general, off-topic, experimental-playground.

General Channel.

Works really well. Moved to Community Guides. :grin:

1 Like

Changing from English to Norwegian language worked well, with one exception; Willow now receives Norwegian text strings (replies) but when reading the text out loud it does it with an English accent. How can I get the Willow TTS server to use the right language when speaking?

Discussion seems to have been moved to Github. Can’t find a Discord channel.

I posted there recently. It’s still active!

Did you figure this out? I assume you meant that you want Norwegian spoken with a Norwegian accent?

Ah, that link gave me an invite - thanks. Sorry - missed the link in the OP.

Discord looks like the best place to find out about updates. Could you add a bit to your original guide about how to upgrade the Espressif Box?