That feature will be introduced in v2023.7
https://www.reddit.com/r/homeassistant/comments/14m7pw0/ha_20237_is_introducing_service_calls_to_assist/
Just giving this question a little bumpâŠ
@kristiankielhofner any chance we can get a docker compose file for willow inference server?
My plan is to run it through docker so I dont have to allocate my entire GPU to the home assistant VM
So excited for this project!
EDIT: you can ignore my question, just read the github which states you guys will provide ready to deploy dockers cant wait!
Sorry, weâre pretty busy and I donât get around to responding here as often as I would like.
âSorry I didnât understandâ comes from Home Assistant so Willow is communicating with HA correctly. However, youâre not matching an intent.
HA has pretty limited processing abilities with intents. You need to make sure your speech command (and the resulting transcript) exactly matches what you have defined in HA. You can use the assistant in the web UI to confirm which commands work (match) and which donât. Many people find they need to significantly expand the built-in intents, make sure entities are exposed to Assist, and often add aliases for them to match their intended grammar.
The HA team as part of the year of voice is making rapid progress on the ability to match commands. Weâll also be adding support for a natural language understanding/processing engine with intent recognition to pre-process commands before they are provided to Home Assistant so the speech to text output doesnât need to perfectly match what is defined in HA.
How goes the development effort? Inquiring minds would appreciate an update. Thanks in advance!!
He just posted an update on youtube yesterday, some pretty cool stuff: https://www.youtube.com/watch?v=qlhSEeWJ4gs
It is that video, which I just noticed on the platform formerly known as Twitter, that brought me here in search of more information. I am old and the thought of talking to my appliances has never really pushed any buttons for me (and my S.O. is flat out opposed to Google/Alexa). But this is local and looks super interesting.
Anyone have a step by step guide to setup an ESP32-S3-BOX using windows? Also are there any advantages of the newer model (ESP32-S3-BOX-3)?
Thanks
https://www.espressif.com/sites/default/files/tools/flash_download_tool_3.9.5.zip
I think as been an age since I did.
Apart from new connectors and a dock its got 8mb added to psram.
I do have an ESP32-S3-BOX, installed using Windows. Unfortunately, not with a full setup guide. As I recall, I found some reasonable guides to install WSL2 & docker, then the Willow Application Server installed as documented. At that point, I could get to the WAS application from the host machine (http://172.17.39.174:8502, IP address from WSL command line: ip route get 1.1.1.1 | grep -oP 'src \K\S+'
). To make this available from the rest of the local network (and in particular the ESP32 box!), I needed to forward the port - from an admin command prompt:
netsh interface portproxy add v4tov4 listenport=8502 listenaddress=0.0.0.0 connectport=8502 connectaddress=172.17.39.174
And I had to open the port in Windows firewall.
I havenât tried a local inference server yet (mostly because my graphics card is AMD rather than NVidia)
All, I just finished creating a full dashboard for our new Willow WIS servers. Itâs near real-time and takes very little resource use. Itâs fun to watch what happens on that server when sending STT through it.
Iâm sure there are other objects that can be monitored, but this is a good start.
Thanks Jeff. I just got my WIS up and running. Iâll be playing with this now!
I think all this stuff couldnât come at a better time. Amazon just laid off hundreds of folks from the Alexa voice division. I also noticed recently that all my Alexa devices âlock upâ randomly. Give a command and the light spins for 30 seconds with no response. At first I thought it was just one, but they all do it. And itâs not network/internet related. Frankly, I wonât shed any tears when I finally unplug all my Alexaâs.
Yeah. I would not allow any voice devices that went to the âcloudâ in my house. Luckily, the wife agreed. The kid was less than happy.
But, now that he is in his own house (with his web voice crap) and notices that ads from Google tend to match discussions they have in their house, he is finally getting it.
Thatâs downright creepy. I refuse to use Google products. Pure Evil has met its match in that company.
BTW, one of the community members @kovram forked the Willow Auto Correct main branch and supports sending unrecognized commands off to Alexa. So, renaming your Alexa to Echo:
âAlexa turn on christmas lightsâ - HA turns on the lights
âAlexa set living room heater to 68Fâ - HA sets the thermostat
âAlexa, set a timer named Bread Rising for 2 hoursâ - HA has no clue how to do this so it gets forwarded to Alexa which sets a timer for 2 hrs
âAlexa how many ounces in a pound?â - again, HA has no clue, so Alexa answers.
âAlexa, play rock & roll radio on Pandora?â HA has no easy way to do this (no actual way on HA Supervised) on Pandora or Apple Music, so this gets sent to Amazon to play music
What I like (love) about this solution is Amazon no longer has ANY contact with my home devices and has no idea what Iâm doing inside my home unless knowing what my named timers are doing counts. And a a bonus, Amazon Alexa can be MUTED since the command are sent to it programmatically. No more eavesdropping by nosy engineers at Amazon. And as time goes on, fewer and fewer commands will be forwarded to Amazon as more intents are created and refined in HA.
That Auto Correct w/ Alexa forwarding fork is here:
Its better than a proof of concept, it actually works really, really well. its is NOT something you should attempt if you arenât at least moderately comfortable w/ Docker, Linux, etc.
Would it be technically possible for willow to be built as an add-on in future for HAOS for those of us less technically inclined with docker etc?
Itâs likely this will probably happen at some point, but thatâs just a guess on my part. If it was done, it would only let you use their hosted, best efforts STT engine, not a local one. Better than nothing tho⊠For useable, local STT, you need a PC with a GPU. A RPi is never (ever) going to give you local STT that is useable. It too compute intensive for a Pi or any other CPU for that matter.
Actually that is totally untrue, in fact massively untrue especially with the Cortex A76 Rpi5 due to the Mat/Mul vector instructions that give a ML boost of over x6 of a Pi4.
Also the STT we have available running, are not the greatest in quality or efficiency and STT is constantly evolving in all aspects.
EfficientSpeech: An On-Device Text to Speech Model
EfficientSpeech, or ES for short, is an efficient neural text to speech (TTS) model. It generates mel spectrogram at a speed of 104 (mRTF) or 104 secs of speech per sec on an RPi4. Its tiny version has a footprint of just 266k parameters - about 1% only of modern day TTS such as MixerTTS. Generating 6 secs of speech consumes 90 MFLOPS only.
That is on a Pi and there are certainly models that without a GPU can produce great realtime voice on a Pi5.
However I donât really advocate a Rpi5 as its far from efficient when RK3588(s) boards of similar price such as the OrangePi5 produce near x2 Gflops/watt and are more efficient than current Apple silicon. GitHub - geerlingguy/top500-benchmark: Automated Top500 benchmark for clusters or single nodes.
Initial attempts often made streaming versions of ASR & TTS to reduce latency, but this is problematic for multiuser systems as queues can quickly build up.
You can take a low cost RK3588(s) with much better idle wattage than a Rpi5 and employ models that are much faster than realtime.
Race-till-idle produces minimal latency and can serve multi-users even though clashes are minimised due to diversity of use.
You can scale by simply adding another metal instance that is fed from a routing queue that if the 1st instance is busy the next âTextâ is routed to a 2nd instance.
Everything is currently available from hardware to opensource software but what we have available running is likely what creates your incorrect assumptions.
The evolution of RK3588(s) has prob slowed due to the CHIPs act as China seems to be switching to RiscV and long term its prob going to hurt Arm more than China.
Everywhere in development from Big Data to China is creating ML ondevice capable silicon, but the worry is that the likes of Google are designing closed source models and accompaning application-specific integrated circuit, ML silicon that is such a massive leap from current opensource in terms of efficiency, speed and quality, it could be very hard for general purpose hardware and opensource to compete.