I wouldn’t tinker too much. Apparently going forward ESP-ADF is no longer needed now that these new boards are coming out with XMOS chips for voice/echo cancelation. The firmware/software for ESP32 via I2S is apparently very new and changing. I have a Seeed respeaker kit, don’t get it, wait until Nabu comes out with something. The listen light didn’t work, Seeed actually released a firmware update that fixed it but got access denied when downloading from their wiki so someone put it on GitHub. YAML code is, well, different in some areas and similar in others. All Nabu has announced is it will have a supposedly better XMOS chip and 3.5mm input and output for another mic but don’t quote me on that. You know Nabu Casa will use it as their main voice testing device and do more updates then say, Seeed and others. I imagine trying to perfect the YAML now is pretty pointless considering a lot might change in the next six months+
I still use Nabu Cloud because it’s better than local and I’m running HA full OS on a 3 year old NUC. They did work with Nvidia to port everything to GPU based but I’m not buying a Jetson module just for local voice commands. I get why they did it, it’s kinda like a raspberry pi, just 10x the price, more for what’s recommended as the 8GB model has a RAM issues in it doesn’t have enough… An LLM is neat but not needed by me personally. I’m not a developer but obvious changes are in the yaml I use for the respeaker lite vs the Korvo-1 or S3 Box which uses a newer wake word also that does actually work better and can be used on anything using Microwakeword.
Thanks for the update. I am aware about the upcoming HA voicebox and I am looking forward to it. However I do not mind tinkering that much. For me it is getting experience with the software and hardware. Until the XMOS hardware from HA is available I am using my own esp32 voice pucks which are already working quiet well for me but still far apart from Google home hardware. I am running HA OS also on a older NUC but I am experimenting with a local LLM on a separate pc. It will make giving commands easier. And HA understand them much easier too. My wife don’t have to remember anymore the exact words to get things done. For me that is a big plus. Besides that it is a much more natural way to controlling the house by voice.
Honestly, my Wyoming satellite is probably the most accurate with the Seeed and S3 box being a close 2nd, the S3 probably beating out the Seeed due to it being the HA’s teams main “device” they focused on for ESP32. Amazon and Google lost billions on voice. On the race to win they realized too late that you couldn’t put in ads after it being free. I do know that newer hardware is more powerful and does more on device, which you can see in the price of Alexa, and even now we have no idea how much either leverages cloud resources to be as accurate as they are. Honestly, background noise, especially TV is the main issue and it has to be hard to determine what’s your voice and not. Music with no lyrics does pretty well but trying to use it watching TV is a no go. Still, a huge accomplishment by Nabu/HA to get this far with something as low powered as an ESP32 being the end device listening for the wake word. They will get there. It’s just going to be gradual. Some people want overnight results and that’s not going to happen.They originally tried to just do the LLM on the Jetson and HA separate but Nabu for fed up and just ported HA Core to the Jetson. It is technically just running a specialized version of Ubuntu for ARM, with all the voice stuff being docker containers. It could honestly be just as bad with background voices/noises anyways so wait and see. Really just looking forward to whatever Nabu announces at this point. Will continue to tinker as always in the meantime.