"ReSpeaker Lite" - new Seeed Studio Voice Assistant Development Kit hardware combine ESP32 with XMOS XU316 DSP chip for advanced audio processing as a ESPHome-based Home Assistant Assist Satellite voice devkit

FYI, Seeed Studio released ”ReSpeaker Lite" development board and “ReSpeaker Lite Voice Assistant Kit” products that looks like they would be perfect as unofficial development kit for the upcoming ESPHome-based Home Assistant’s "Assist Satellite” Voice Assistant DevKit framework that combines ESPHome + Home Assistant developers for both voice control and media playback via Music Assistant (as an ESP32 based media player and smart speaker platform), as well as for DIY ”Assist Sattelite” early adopters and thinkers for when ESPHome get I2S audio DSP support for XMOS xCORE audio AI accelerator and other “Assist Satellite” audio, voice and media player components are added to ESPHome firmware (as well as matching “Assist Sattelite” voice assistant and media player features added to the Home Assistant core).

Seeed Studio currently sell three different kits containing different components/parts in combination with the main board, and note that you can also add speaker and a basic enclosure to each kit to get a stand-alone setup:

All these"ReSpeaker Lite" kits use a solution that is designed to combine an XMOS XU-316 AI (xCORE XU316) which is a dedicated DSP (Digital Signal Processor) Audio Processor Microcontroller IC chipset which can act as a audio co-processor (i.e. onboard sound-card) for all type of advanced on-device sound and voice processing with an ESP32-S3 that can be used to run ESPHome firmware to allow for Voice and Assist integration with Home Assistant integration, and they have a wiki with instructions on how to set it up (though unclear if and how that setup is utilized the XMOS chip for advanced audio processing):

The XMOS xCORE DSP chip acts like sound-card co-processor adding in-line off-loading of audio noise removal (voice clean-up) from the microphone(s), like Interference Cancellation (IC)​, Acoustic Echo Cancellation (AEC), Noise Suppression (NS), and Automatic Gain Control, etc. and/and other audio post-processing algorithms to improve the solution’s voice recognition capabilities). (Depending on which XMOS chip they use their XCORE-VOICE framework could technically also allow also for up to 16 PDM microphones to be connected to a single xCORE device with a different PCB design).

UPDATE: I’m paraphrasing but one of the representatives that is working on that "voice-kit"project more or less wrote there that while ESPHome and Home Assistant voice developers from Nabu Casa are now focusing to work on voice assistant features they are still figuring all this around making use of external audio processors and while they are currently only testing the XMOS xCORE chip as a candidate for an ESPHome-based voice-kit reference hardware design for official Home Assistant Voice Assistant development kit they also plan to work on “audio processor” component for ESPHome with hardware-independent architecture that will not be reliant on specific hardware configurations or dependent specifically on the XMOS xCORE DSP chip but instead allow others to add support for additional DSPs as audio processors (i.e. sound co-processors) in the future, (plus the fact that they will make it so that all the I2S settings and pins are still configurable in YAML, meaning that it should at least be possible add support for DSP types to the “audio processor” component if they work similar to XMOS xCORE DSP chips, as well as different board designs that uses other I2S settings and pins). That representative also wrote; “we will add all the code to the base ESPHome project once things are stable and working well”.m and noted that ESPHome and Home Assistant / Nabu Casa developers are right now moving very fast and breaking things as they go so working on experimental code for the new voice-kit related components for ESPHome in a separate repository on GitHub here:

Note that these “ReSpeaker Lite” kits (all three available via the same webpage) seem to specifically be targeting ESPHome voice-assistant developers and early-adopter tinkerers, and currently looks to be available as a full development-kit that including an ESP32-S3 based board (Seeed Studio’s XIAO ESP32S3) with simple acrylic enclosure and a speaker or a smaller DIY-kit that only includes the ReSpeaker Lite board where to user will needs to add their own MCU board, SBC or other computer using I2S and/or USB connections, as well as a larger third-kit that include both those two kits. Both the boards have on-board dual-microphone array (2-Mic Array) for far-field voice control interaction, and it says that these kits support custom firmware updates via DFU-Util (though unclear if that includes firmware and/or configuration for the XMOS as well), and it also features an stereo audio output jack for connecting external (Hi-Fi) speakers.

Seeed Studio published these videos showing the stand-alone ESP32 model and the USB model:

CNX Software has a nice summary blog article with details on the technical hardware specification:

Even if XMOS is proprietary hardware they are very popular and have open-source compatible libraries:

As far as I can tell the complete source code for XMOS’s xcode-voice firmware is in sln_voice repo:

More information about that in their user-guide for their XK-VOICE-L71 which is based on same chip:

These uses XMOS’s “XCORE-VOICE” platform / software development kit software:

Now the hardware specification of these sounds very similar to that of the upcoming voicekit hardware platform that Nabu Casa developers have mentioned that they are working on, or? …though sound as if these ReSpeaker Lite kits are missing Nabu Casa’s expansion ports to allow for third-party hardware addons?

Not sure if all of you have followed news about Nabu Casa’s upcoming voice-assistant hardware project, but Paulus Schoutsen revealed on their Home Assistant’s ESPHome Summer Release Party on YouTube that Nabu Casa’s ESPHome developers and hardware developers are in fact working on an upcoming open-source voice-kit hardware platform for voice-assistant development that will based on ESP32-S3 in combination with an XMOS xCORE chip (which is a very powerful for all kinds of different audio processing).

Nabu Casa’s upcoming ”Assist Sattelite” voice-kit hardware platform was also mentioned as a new development framework under the Voice Assistants section of Home Assistant’s Roadmap 2024 Midyear Update post, and then @synesthesiam and kahrendt again also shortly talked a little about Nabu Casa’s upcoming voice-kit satellite hardware development platform during their “Voice Chapter 7” livestream video (where they confirmed that it will be based on an XMOS chip in combination with a ESP32-S3 chip on a single biard, running ESPHome it will become available as an “official” Home Assistant voice-kit reference platform made by Nabu Casa for ESPHome developers):

I believe similar xCORE chips from XMOS is by the way used in Amazon Alexa Voice Service (AVS) Development Kit(s) solutions and even in some Amazon Echo products?

…and while I am unsure of it I would suspect that Google Nest / Google Home smart speaker series also contain XMOS xCORE chip?

Anyway, regarding this, I posted my hardware/software wishist for ESPHome and Home Assistant here:

First of all, I wish that there could be a native HiFi quality music players inside ESPHome (or Home Assistant) that would fully integrated with Music Assistant so that you could simply set these speakers as “Player Provider” inside Music Assistant, as well as allow to group speakers for syncronized multi-room playback.

Thus wondering if ESPHome voice assistent combined hardware and firmware platform will also be great for music playback if high-quality amplifier and speakers are used?

Any work being done to also make ESPHome based voice-assistant devices better media player recievers with native support for featues such a multi-room and syncronized Hi-Fi quality playback?

I am hoping that since Nabu Casa’s designs it said to be open-source hardware and XMOS integration will probably be added to the ESPHome’s Media Player Components (and Microphone Components) I for one am hoping that it could and will be extended to different types of speakerless solutions with appliance solutions with AUX-output/audio-output and AUX-input/audio-input port and not only for voice-assistant.

Personally I would also love to see inexpensive speakerless network-streamer player/receiver hardware without microphones but only with with AUX-out that can connect to any of your existing amplifiers or speakers with built-in amplifiers in order to replace products like Chromecast Audio and Amazon Echo Input / Echo Link Amp, (e.i. devices with no on-board speakers that must be connected to external speakers for audio output (AUX-output).

That is, I am sure that not everyone only wants “smart speakers” with voice-assistant and that instead many would be also happy to have network streamers/players without microphone which only purpose is to receive and output highest quality audio possible from Music Assistant to your “dumb” speakers.

I for one still have loads of Chromecast Audio audio-only receivers connected to various models and brands of different speaker/reciever systems in each room used to achieve multi-room music playback on a budget (because could not afford Sonos speakers in all rooms).

So even if though Nabu Casa’s hardware will initially primarly be designed for “Home Assistant Satellite” (also known as “Wyoming Satellite”) for voice-assistant appliances, such open-source hardware it just like the ESPHome firmware does have a lot of potential for different use cases.

Also on my wishlist if a network streamer receiver hardware with AUX-input and ADC to get music from analog audio source. As an easy way to achieve a remote AUX input into Music Assistant from an external analog audio source like a vinyl record player (LP turntable) or cassette player.

What I want to achieve is a solution that is easy to install/maintain and use that allow my wife to stream music from a vinyl record player (LP turntable) to any speaker or group of speakers in our home. The vinyl record player (turntable) setup she has a pre-amp with phono (RCA) output ports for analog audio in stereo.

  • Architecture example: Analog audio source with preamp → ADC network appliance → music stream → Music Assistant → Any speakers

I would therefore prefer if we could buy some kind of networked (Wi-Fi) enabled appliance like a music streamer with stereo AUX input port that it will use for on-the-fly perform analog-to-digital conversion (ADC) + encoding for streaming to a Music Provider inside Music Assistant.

I do however think that both such a solution does need its own non-propriatory audio-only streaming protocol for high-quality music streams?

3 Likes

Thanks for the share, did you get a chance to use/test the respeaker lite with ESPHome and HA voice assistant ? I’m so impatient to receive mine that is still in transit !

I actually did not order one as on a tight budget now. I will personally wait for Nabu Casa to release their own official Home Assistant ”Assist Sattelite” hardware kit, which should hopefully be before the end of this year.

WOOT! Stumbled on this experimental “voice-kit” GitHub repository fork of ESPHome where Nabu Casa developers are developing new or improved components to better support voice control and I2S audio (XMOS xCORE as a audio accelerator DSP) as well as a new ESP32 native Nabu media player for proper high-quality music playback support for MP3, WAV, FLAC, etc. (to be matched and combine with Music Assistent) and be used the upcoming official Home Assistant ”Assist Sattelite” voice-kit hardware platform from Nabu Casa:

I’m paraphrasing but one of the representatives that is working on that "voice-kit"project more or less wrote there that while ESPHome and Home Assistant voice developers from Nabu Casa are now focusing to work on voice assistant features they are still figuring all this around making use of external audio processors and while they are currently only testing the XMOS xCORE chip as a candidate for an ESPHome-based voice-kit reference hardware design for official Home Assistant Voice Assistant development kit they also plan to work on “audio processor” component for ESPHome with hardware-independent architecture that will not be reliant on specific hardware configurations or dependent specifically on the XMOS xCORE DSP chip but instead allow others to add support for additional DSPs as audio processors (i.e. sound co-processors) in the future, (plus the fact that they will make it so that all the I2S settings and pins are still configurable in YAML, meaning that it should at least be possible add support for DSP types to the “audio processor” component if they work similar to XMOS xCORE DSP chips, as well as different board designs that uses other I2S settings and pins). That representative also wrote; “we will add all the code to the base ESPHome project once things are stable and working well”.m and noted that ESPHome and Home Assistant / Nabu Casa developers are right now moving very fast and breaking things as they go so working on code for the new voice-kit related components for ESPHome in a separate repository on GitHub here:

They already added features and functions or improvements/enhancements to ESPHome, such as:

  • New: Nabu Media Player - new “nabu” media player from Nabu Casa running natively on ESP32
    • Music Assistant streams work (both mp3 and flac), but since it requires resampling, the audio quality isn’t great
  • New: Added support for FLAC files
  • New: Added a proper WAV decoder (that parses WAV headers with LIST, INFO, etc. chunks.)
  • New: Initial support for playing back local files
  • New: Playback Control for the VoiceKit
  • New: Added an is_paused condition for media players.
  • New: Add Click to Converse to button
  • New: LED animation
  • New: Scripts for controlling LEDs
  • New: Update Button Behaviour for the Voice kit
  • New: Dial Volume Control
  • New: Timer basic implementation
  • New: Dial Volume Control
  • New: Added HTTP(s) OTA updates
  • New: Dial Volume Control
  • New: Added Buttons for force ota update.
  • New: Software Mute Switch
  • Improvement: A basic resampler adjusts sample rates
  • Improvement: Configurable output sample rate (for experimental 48kHz XMOS firmware)
  • Improvement: The DAC mute state is read on boot
  • Improvement: volume/mute control via the DAC (the wheel works for increasing/decreasing volume)
  • Improvement: Logs what element failed if the pipeline breaks
  • Improvement: Fails gracefully if the incoming stream can’t be processed
  • Improvement: Differentiate between user facing LED Ring and Internal LED ring
  • Point external component to dev branch

They also have many TODO inline coments in the code there if anyone are interested in helping them:

Note! Be aware that there are many comments there to that most of the new stuff are not yet stable.

I’m interested but won’t be the first to purchase. Feels kind of rushed, even the full kit with the acrylic case looks “odd” just having the board on top of the speaker. It looks like a respeaker 2 hat with no pins (at least like the ones used for a Wyoming satellite) and slapped an XMOS and XIAO ESP32-S3 slot on there. Considering you can get a 2 mic respeaker hat for 7 dollars and a pi zero 2w for roughly the same price, minus the speaker, it’s a wait and see.

While I doubt it is, it would be great if this took advantage of the new OTA feature where you can install ESPHome devices without the user having to install ESPHome at all for new users. That would probably make it sell better but I have no idea what the process is for doing that. 4 PDM mics probably would have been better and while Nabus version will also have XNOS, it doesn’t sound like it’s quite there yet on the software side for ESP32/ESP-IDF. While I’m sure Nabus version will cost more, they are also hinting at more powerful hardware with better capabilities.

What’s odd is I just got my m5stack CoreS3E yesterday which works great besides sound output as it’s essentially a stackable S3 box so I got a 500Mah battery but there’s a RCA unit you can add that has left/right RCA outputa.

Also, and I’m speculating the answer is no, but is ESP_ADF a requirement for microwakeword or not? I don’t believe it is as the Atom Echo doesn’t have anything in it about esp_adf, but the S3 box versions do. I also have a korvo-1 and it takes way longer to compile when ESP_ADF is specified. I guess my question is what does esp_adf add over not using it as you have to specify the board and I just use the s3box because nothing else seems to work. I was searching last night and didn’t find any clear answers or anyone even discussing it. I think it takes around 300 seconds to do a clean build on my Korvo-1 and maybe 90 seconds on my CoreS3E on an x86 mini PC. Most of the time on the korvo-1 is tensorflow and TLITE (or something similar) which is part of tensorflow. I really don’t see any difference in performance. If anything the CoreS3E seems to be better outside the audio output issue which seems to be a pretty common problem. Anyone that has an S3 box and say, an atom Echo should notice the difference in the amount of time it takes to do a clean build.

I was looking at these. I already have two esp32-s2-box2 but would like to try more things.

Does anyone know the exact SKU of the XMOS chip? There are 14 different versions on the XMOS product page. I don’t know if those numbers would equal better performance as a lot of numbers match, like cores, but others don’t. On another thread someone was saying the XMOS code is proprietary for this device. I’m just going to wait and see what Nabu comes out with because it will just work. Some SKU’s can support external LDDR1 RAM which may or may not be useful but I imagine it would help in being a media player for music at a minimum. On that Summer video when they mentioned the hardware one of them said the XMOS chip had 16 cores so it’s one of the 14 below looking at XMOS’s product page. Not really anything else on there more powerful unless you go to 24/32 cores and this appears to be their newest SKU lineup currently available.

The pic of the xmos chip on the respeaker board deff is not a 60 pin package. So that leaves the 265 FBGA or 128 TQFP package chips, since it has pins, it is not the FBGA packages. So, from your list, by means of elimination, it should be one of the bottom 2 - 128 TQFP package chips.

If so, there is no external memory support :(. The only difference between the 2 is the quality of components used (industrial vs. commercial).

I may have misunderstood though, it sounds like you are wondering about what chip the hass hardware will have in it?

2 Likes

What picture are you looking at? I’m looking at picture on their wiki, an it looks sharper than the others:

As far as I can tell picture show a 60-pin package, or at least I counted 15 pins on each visable side.

As far as I can tell the chip in the picture look to read this:

XMOS
V16A0
G12342P2
TF1148.00

And if do a search for “V16A0 AND XMOS” I only find the datasheet for “XU316-1024-QF60A” SKU in a 60pin package, and from a cost-effectiveness perspective I guess it makes more sense to use “XU316-1024-QF60A-C24” (offering 2400 MIPS) over the faster “XU316-1024-QF60A-C32” (offering 3200 MIPS) even though from developers and end-user perspective we probably want the faster variant:

And I believe that would also make sense from a hardware developer point-of-view to use either XU316-1024-QF60A-C24 (or XU316-1024-QF60A-C32) since XU316-1024-QF60A-C24 is what is used by XMOS’s “XK-VOICE-L71 Voice Reference Design Evaluation Kit” so it is very well documented and tested:

https://www.xmos.com/file/xk-voice-l71-pcb-design-files/?version=latest

1 Like

You are correct, I was looking at the picture above on mobile, which is quite blurry.

Edit: check this out, just saw it on YouTube.

1 Like

Wow! So apparently FutureProofHomes has also announced his ”Satellite1 PCB Dev Kit” also sound to use same or similar XMOS chip in combination with an ESP32-S3 module but he has designed as a two-board PCBs voice satellite hardware development kit:

GitHub repository:

Website:

Satellite1 PCB Dev Kit

The Satellite1 PCB Dev Kit contains the two PCBs necessary to build your own completely private voice assistant & multi-sensor with XMOS advanced audio processing & music playback. Add your own speaker and power supplies.

Satellite1 HAT Board:

This board features 4 PDM microphones, 12 NeoPixel LEDs, humidity/temp/lux sensors, 4 buttons (volume up/down, action button & hardware mute), plus the XMOS audio processing chip and a power DAC with for amplified speaker-out connection or 3.5mm headphone connection. All remaining GPIOs are also exposed.

The Satellite1 Hat connects easily to the Sat1 Core Board but can also be paired with a Raspberry Pi or a PC/Mac via USB! Perfect for all your voice assistant and audio projects!

Satellite1 Core Board:

The Satellite1 Core Board contains the ESP32-S3 n16r8, USB-C Power Delivery and 40-pin connection. This board attaches to the companion Sat1 HAT Board.

Looks like he also posted a future roadmap showing that he working on a a nice enclosure (as well as the mentioning of an optional recessed enclosure for in-cealing / in-wall mounting of this smart speaker):

And yeah, I noticed now that FutureProofHomes had posted a preview video on YouTube showing of an early prototype version devkit of that 4-months ago when he at the time called that project “HomeX” (but at that time he had based the prototype on the wyoming-satellite platform running on a Raspberry Pi instead of using Nabu Casa’s upcoming ESPHome-based voice-kit hardware platform that runs on ESP32-S3 and using an XMOS xCORE chip for audio processing):

PS: The new design reminds me of the “Onju Voice” PCB replacement for the Google Nest Mini (2nd gen), which is a open-source hardware project that I hope someone else will pick up and update now:

@FutureProofHomes Can you tell us which exact SKU of XMOS chip your product will be using?

And also wondering if your PBC(s) will be open-source hardware and/or use OSH/OSHW design?

1 Like

Here’s the actual XMOS chip we’re using:

I just updated the repo to clarify a bit our open-source strategy. In a nutshell, upon launch all the firmware (ESP & XMOS) will be open source and all our hardware schematics will be published too. The KiCad proj. files will follow a delayed open source model (I’ll publish those dates for us), at which point we will then put out the proj. files too. Open to folks thoughts on this! And again, if you want to work closely with the core-team then please do ping me!

3 Likes

@FutureProofHomes Any input on the media playback capabilities for music playback and audio quality via these type of products if using with better speakers?

I read that Nabu Casa’s will have an audio output jack (3.5mm headphone jack) for connecting external speakars.

For reference, I currently have ALL the various Google Nest (and Googgle Home) speakers in different rooms and using them for multi-room music playback.

Nice! Hope that you get more hardware developers onboard!

Cool! So you are using the little faster “XU316-1024-QF60B-C32” (3200 MIPS) SKU and not the “XU316-1024-QF60B-C24” (2400 MIPS) that both the ReSpeaker Lite and XK-VOICE-L71 Voice Reference Design Evaluation Kit are using → Processor Catalogue | XMOS

That sounds awesome! Pun intended :wink:

I’ll track the thread over here if that’s okay. :slight_smile:

3 Likes

Open source!

Hell yeah.

1 Like

FYI @alextrical also wrote that FutureProofHomes Satellite1 will be using a “B” model in the “XU316-1024-QF60B " chip series s native 3.3v IO, as opposed to the “A” models in the “XU316-1024-QF60A " chip series that would need logic level shifters to convert to 1.8v IO for the ESP32. He also mention that they will go with a C32 (3200 MIPS) variant if the economy of scale looks to allow it without adding too much extra cost to the BOM.

FYI, FutureProofHomes have posted a new video on their YouTube channel showing off the current design of their ESP32-based hardware prototype upcoming FutureProofHomes Satellite1 voice control development board which looks to now be using such a XU316-1024-QF60A-C24 based XK-VOICE-L71 (XMOS Voice Reference Design Evaluation Kit connected externally. XK-VOICE-L71 (XMOS Voice Reference Design Evaluation Kit features 3,5mm line out jack for audio output to external speakers and @FutureProofHomes mentioned that their final dev-kit product will also feature a 3,5mm jack for audio output. Check it out their latest prototype introduction video here:

https://www.youtube.com/watch?v=Vp5q4RIwCX4

2 Likes

I’ve just setup my ReSpeaker.
Is there a way to control the speaker volume?

Additionally, it would be nice to keep the discussion in this thread on topic, it’s confusing to follow with the inclusion of other similar but unrelated developments.

Are you using the components from the esphome “voice-kit” repository? If so then maybe best would be for you to post that question as a new issue there to get the attention of the developers workong on that project? See → Issues · esphome/voice-kit · GitHub

Note that even if they are not using the new ReSpeaker Lite kit themselves the question is perhaps better addressed there as theymentioned tjat one goal for that repo is to not be reliant on specific hardware configurations.

Quoting kahrendt from another issue there in that repo:

"We are moving fast and breaking things as we figure out the best way for these components to interact, but we will add all the code to the base ESPHome project once things are stable and working well. One goal for this repo is to not be reliant on specific hardware configurations. These other boards you linked are quite exciting, and I believe they should be compatible (or relatively easily made compatible) with our changes. For example, all the I2S settings and pins are still configurable in yaml, so it should be straightforward to add support for similar boards.

Very little of the code is reliant specifically on the XMOS chip (and the few lines that are should be adaptable or won’t even be there in the final version as we clean up the code), so it should be possible to add support for other DSPs in the future."

2 Likes