Voice Chapter 7 - Supercharged wake words and timers

Hedda · August 5, 2024, 4:43pm

FYI, the newly announced ”ReSpeaker Lite" (ReSpeaker Lite board only) and “ReSpeaker Lite Voice Assistant Kit” (ReSpeaker Lite board with onboard ESP32) products.

So that same webpage offers different models, with one being a 2-Mic Array board model that combines ESP32-S3 ESPHome support + a XMOS XU-316 chip for advanced audio processing, and a second model that is a DIY-varient that is only a 2-Mic Array board model with just the XMOS XU-316 chip that you can use with your own compute solution (thus you need to add your own MCU board or a SBC/computer such as a the Raspberry Pi) and connect it via I2S or USB.

@synesthesiam The former full kit solution sound similar to the hardware specification mentioned for the upcoming voice-kit development platform that Nabu Casa members said that they are working on, or? …though looks like the ReSpeaker Lite Voice Assistant Kit is missing expansion ports and if so will not allow for additional hardware addons?

donburch888 · August 6, 2024, 11:02am

Looks interesting, and am particularly pleased to see that their demonstration video is using Home Assistant with ESPHome and wyoming For anyone else interested, the product page has price USD$26.91 for full kit with an enclosure. More info in the Wiki.

They state Onboard AI Algorithms … the kit includes Automatic Speech Recognition algorithms for Interference Cancellation (IC) , Acoustic Echo Cancellation, Noise Suppression, Voice-to-Noise Ratio (VNR), and Automatic Gain Control (AGC), enabling high quality voice capture.

I found a firmware file, but so small that I guess most of the algorithms are closed source hardcoded on-chip.

Personally I don’t see seeed as being interested in supplying hardware and support to end users - and wouldn’t want a repeat of their “support” for the reSpeaker 2-mic HAT for Raspberry Pi.

Hedda · August 6, 2024, 11:12am

WOOT! Stumbled on this new “voice-kit” GitHub repository where ESPHome developers are developing new or improved components for I2S audio (XMOS) support and media playback support for FLAC, etc. for the upcoming voice-kit hardware platform from Nabu Casa:

GitHub - esphome/voice-kit
- voice-kit/esphome/components at dev · esphome/voice-kit · GitHub

They already added features and functions or improvements/enhancements to ESPHome, such as:

New: Nabu Media Player - new “nabu” media player from Nabu Casa running natively on ESP32
- Music Assistant streams work (both mp3 and flac), but since it requires resampling, the audio quality isn’t great
New: Added support for FLAC files
New: Added a proper WAV decoder (that parses WAV headers with LIST, INFO, etc. chunks.)
New: Initial support for playing back local files
New: Playback Control for the VoiceKit
New: Added an is_paused condition for media players.
New: Add Click to Converse to button
New: LED animation
New: Scripts for controlling LEDs
New: Update Button Behaviour for the Voice kit
New: Dial Volume Control
New: Timer basic implementation
New: Dial Volume Control
New: Added HTTP(s) OTA updates
New: Dial Volume Control
New: Added Buttons for force ota update.
New: Software Mute Switch
Improvement: A basic resampler adjusts sample rates
Improvement: Configurable output sample rate (for experimental 48kHz XMOS firmware)
Improvement: The DAC mute state is read on boot
Improvement: volume/mute control via the DAC (the wheel works for increasing/decreasing volume)
Improvement: Logs what element failed if the pipeline breaks
Improvement: Fails gracefully if the incoming stream can’t be processed
Improvement: Differentiate between user facing LED Ring and Internal LED ring
Point external component to dev branch

They also have many TODO inline coments in the code there if anyone are interested in helping them:

https://github.com/search?q=repo%3Aesphome%2Fvoice-kit%20todo&type=code

Note! Be aware that there are many comments there to that most of the new stuff are not yet stable.

Not sure, they clearly aiming for ESPHome compatible. See separate thread for more discussion here:

"ReSpeaker Lite" Seeed Studio's new Voice Assistant Kit hardware combining ESP32 with XMOS XU-316 MCU chipset for advanced audio processing in ESPHome voice-kit

ginandbacon · August 6, 2024, 10:18pm

I have to agree a little. This is clearly geared towards ESPHome but they will probably put out one config example and never update it for a voice assistant in ESPHome. Also, after looking at the XNOS datasheet, Seeed doesn’t specify the exact model and there are 10 or so with different capabilities, some supporting external LDDR1 RAM. On the summer release video when they briefly discussed the hardware, one of them sade the XMOS chip had 16 cores so I’m 99% sure it’s this model, but the numbers after make a difference.

If the firmware is closed source in the XMOS chip then it can’t be an ESPHome ready device like Nabus will be. It will be up to the community to tweak any YAML for this thing although I do expect it to sell well. Also, it looks rushed, the full kit is just the board in an acrylic case sitting on the speaker. It’s essentially a respeaker 2 hat with an XMOS chip and an ESP32-S3. Not sure why anyone would try to use anything but a XIAO on it.

Seeed is invested in ESPHome/Nabu now though. They are a reseller of nvidias Jetson compute modules and someone from Seeed started the post to get a local LLM working though and HA devs and Nvidia devs have made a lot of effort to port stuff. They are just interested in selling hardware but after it’s in your hands don’t expect anything else from them. There is nothing wrong with that approach but if they just made it open source (and it may be) I would have more faith in long term support. While I expect Nabus voice assistant too more expensive, you know it will just work and continue to be supported for at least 2 years, probably more

I would.like to know the exact model that’s in this and what will be in Nabus version but they obviously haven’t released hardware specifications yet.

donburch888 · August 7, 2024, 12:45pm

It certainly seem to be taking ESPHome and Nabu Casa seriously, which is great to see !

Agree. I believe seeed and Espressif want to sell chips to companies who will develop products - not to directly support every hobbyist’s one-off project … hence the lack of ongoing support.
Sometimes they have to drum up interest (particularly in newer technologies), and so make small runs of “development kits” (probably at a loss) to show off the potential to developers.
Once interested, the developer will invest in learning the chip, developing their own product, selling and supporting their product … while buying lots of chips

Exactly !!! It seems Nabu Casa will use XMOS chips, and Nabu will do the bulk of the work to bring a user-friendly product to market, including providing support and (presumably) Open Source.

Yes, Nabu Casa’s VoiceKit will cost more than the XMOS development kit - but I trust Nabu Casa to put a fair price on their VoiceKit (though surely plenty will expect to price match with Google/Amazon’s deep pockets). I hope they allow an option for us to 3D print our own enclosures; and maybe the reSpeaker Lite is so close to Nabu Casa’s target design that we might also have an option to buy a reSpeaker Lite clone, assemble ourselves, and add ESPHome. I can’t wait to see what Nabu Casa come out with.

The future is looking very bright
Time for me to start saving my pension

Hedda · August 7, 2024, 1:32pm

ginandbacon:

Also, after looking at the XNOS datasheet, Seeed doesn’t specify the exact model and there are 10 or so with different capabilities, some supporting external LDDR1 RAM. On the summer release video when they briefly discussed the hardware, one of them sade the XMOS chip had 16 cores so I’m 99% sure it’s this model, but the numbers after make a difference.

If the firmware is closed source in the XMOS chip then it can’t be an ESPHome ready device like Nabus will be. It will be up to the community to tweak any YAML for this thing although I do expect it to sell well. Also, it looks rushed, the full kit is just the board in an acrylic case sitting on the speaker. It’s essentially a respeaker 2 hat with an XMOS chip and an ESP32-S3. Not sure why anyone would try to use anything but a XIAO on it.

@donburch888 @ginandbacon if want deeper discussions regarding that ReSpeaker Lite product specifically that only applies to it then suggest that you post to the separate thread that instead, see:

https://community.home-assistant.io/t/respeaker-lite-voice-assistant-kit-seeed-studio-voicekit-combining-xmos-xu-316-and-esp32-s3/756944

donburch888 · August 8, 2024, 5:41am

Yes it is interesting, showing a wider interest in better cheaper voice assistant devices … but unless Mike says that this is the hardware in Nabu Casa’s VoiceKit I will just wait for Nabu Casa VoiceKit.

Hedda · August 8, 2024, 6:42am

FYI, FutureProofHomes has now also announced a similar XMOS and ESP32-based two-board Voice Satellite hardware development kit for Home Assistant that he is call ”Satellite1 PCB Dev Kit”

Satellite1 PCB Dev Kit

The Satellite1 PCB Dev Kit contains the two PCBs necessary to build your own completely private voice assistant & multi-sensor with XMOS advanced audio processing & music playback. Add your own speaker and power supplies.

Satellite1 HAT Board:

This board features 4 PDM microphones, 12 NeoPixel LEDs, humidity/temp/lux sensors, 4 buttons (volume up/down, action button & hardware mute), plus the XMOS audio processing chip and a power DAC with for amplified speaker-out connection or 3.5mm headphone connection. All remaining GPIOs are also exposed.

The Satellite1 Hat connects easily to the Sat1 Core Board but can also be paired with a Raspberry Pi or a PC/Mac via USB! Perfect for all your voice assistant and audio projects!

Satellite1 Core Board:

The Satellite1 Core Board contains the ESP32-S3 n16r8, USB-C Power Delivery and 40-pin connection. This board attaches to the companion Sat1 HAT Board.

Looks like he has posted a future roadmap showing that he working on a a nice enclosure and more:

Noticed that @FutureProofHomes had a preview video on YouTube mentioning this project as “HomeX” 4-months ago (but at that time he had based the prototype on the wyoming-satellite platform running on a Raspberry Pi instead of using Nabu Casa’s upcoming ESPHome-based voice-kit hardware platform that runs on ESP32-S3 and using an XMOS xCORE chip for audio processing):

PS: The new design reminds me of the “Onju Voice” PCB replacement for the Google Nest Mini (2nd gen), which is a open-source hardware project that I hope someone else will pick up and update now:

FutureProofHomes · August 8, 2024, 1:58pm

Thanks for sharing out @Hedda! Happy to answer any questions you guys may have. Ask away.

We’re aiming to launch before Christmas and detailed documentation is coming. Hit me up if you want to help the core-team and have extensive hardware/firmware skills. We’re excited to launch!

Hedda · August 8, 2024, 2:00pm

@FutureProofHomes Can you tell which exact SKU of XMOS chip you use? Same is also asked here:

FutureProofHomes · August 8, 2024, 2:46pm

Oh, didn’t see the separate thread over here too. Maybe let’s keep the conversation here since it makes more sense?

Can you tell which exact SKU of XMOS chip you use?

Here’s the actual XMOS chip we’re using:

And also wondering if your PBC(s) will be open-source hardware and/or use OSH/OSHW design?

I just updated the repo to clarify a bit our open-source strategy. In a nutshell, upon launch all the firmware (ESP & XMOS) will be open source and all our hardware schematics will be published too. The KiCad proj. files will follow a delayed open source model (I’ll publish those dates for us), at which point we will then put out the proj. files too. Open to folks thoughts on this! And again, if you want to work closely with the core-team then please do ping me!

I read that Nabu Casa’s will have an audio output jack (3.5mm headphone jack) for connecting external speakars.

The Sat1 has this as well. You’ll be able to power a 25W speaker directly from the device OR plug-in an external amplified speakers via the 3.5mm headphone jack.

sender · August 8, 2024, 5:51pm

Go on! That make me a nice Christmas gift!

SpencerDub · August 9, 2024, 2:48pm

@FutureProofHomes Really excited to see this.

Oddly specific use-case question: do you think enabling the bluetooth_proxy feature on the Sat1 would lead to any performance degradation? I’m building out room presence detection in my home using Bermuda and would love for any ESP32-based satellite like this I use to also act as a proxy, but I’ve heard of other ESP32 devices becoming unreliable when tasked with music + satellite + Bluetooth.

FutureProofHomes · August 9, 2024, 3:52pm

Bermuda implementation on my end is stable with all the other ESPHome bells and whistles turned on. It’s looking good @SpencerDub!

SpencerDub · August 9, 2024, 3:55pm

That’s great news! You may very well have my dream voice assistant in the works; it’s like you built it with exactly my desired features in mind. Excited for your launch!

FutureProofHomes · August 9, 2024, 4:06pm

That’s the goal. Build the holy grail!

ginandbacon · August 11, 2024, 12:16am

Probably one of the last things on your mind right now but I was wondering if you were going to try and make this an “Made for ESPHome” device, especially since the latest OTA functionality. One device that most people have heard of is the “everything presence one” mmwave sensor. With the new OTA update functionality it allows users to add these devices to HA and do updates without the ESPHome add on installed. I imagine Nabus version will use this functionality. These are a bit different than the ones you can flash on their site. Here are the requirements

It just seems like this method would be easier for new users and allow for this device to be easy installed. It also appears you meet all the requirements, specifically the software being open source. It would just make it more appealing to newer HA users who may not want to mess around with creating a secrets file for various information. Just a thought.


For all projects
Your project is powered by ESPHome (runs ESPHome as its firmware)

Your project is powered by an ESP32 or supported ESP32 variant such as the S2, S3, C3, etc.

Your ESPHome configuration is open source, available for end users to modify/update

Users should be able to apply updates if your project sells ready-made devices

Your project supports adoption via the dashboard_import feature of ESPHome (see Sharing). In particular:

There are no references to secrets or passwords

Network configuration must assume defaults (no static IPs or DNS configured)

The configuration must be valid, compile and run successfully without any user changes after adopting it.

Use of remote packages in the YAML is permitted only if the above criteria are met.

Your product name cannot contain “ESPHome” except in the case of ending with “for ESPHome”

Updates via http_request

Update Entities
So, we created update entities. These are similar to the ones that Home Assistant shows now when you have the ESPHome Add-on installed in Home Assistant OS, except those ones show you an update to the version of the ESPHome Add-on and in the background will compile and upload new firmware to your device.

These new update entities are a bit different. If you have acquired a device that was pre-installed with ESPHome, the vendor you acquired the device from is now able to compile the firmware and host it on a website along with a description of the firmware the device can read and present that there is an update available for this device. You do not need to adopt the device into the ESPHome dashboard, and you don’t actually need the ESPHome dashboard installed. Using the new http_request OTA platform, the device will be able to download the firmware and update itself.

FutureProofHomes · August 11, 2024, 8:12pm

Will definitely look into this! I’m somewhat aware of this program but it looks right up our alley. Thanks for the tip @ginandbacon.

habaud · August 13, 2024, 2:43pm

@FutureProofHomes
Wondering whether the satellite1 will support the following use case. I would like to be able to have separate wake words for separate functions. For example, “Jarvis” to open communication with home assistant, “speech” to open communications with a program running on a different server, and so on. I would like one device that serves both as home assistant controller and as a microphone input for a separate program with separate speech to text capabilities. Ideally, the voice activity detection and wake word detection would occur on the satellite1 device, and the digitized audio could be directed to my Python program through a network communication technique such as a websocket. And I would like the option of keeping the speech to text going until there is a new command to shut it off. That command could either be detected by the satellite1 or by my own program which could signal the satellite1 via websocket to stop sending speech. That’s my “Holy Grail”.

FutureProofHomes · August 13, 2024, 4:01pm

The Sat1 hardware won’t necessarily unlock that feature, but with a little hacking it should be possible to build what you’re describing today, I think? Perhaps you could use the multi-wake-word and multi-pipeline mapping feature and have one of the wake word/pipelines fire up a UDP stream (you’d have to custom-build a UDP streaming ESP component) targeting the correct STT endpoint for your application.

Just thinking out loud.