FYI, Seeed Studio released ”ReSpeaker Lite" development board and “ReSpeaker Lite Voice Assistant Kit” products that looks like they would be perfect as unofficial development kit for the upcoming ESPHome-based Home Assistant’s "Assist Satellite” Voice Assistant DevKit framework that combines ESPHome + Home Assistant developers for both voice control and media playback via Music Assistant (as an ESP32 based media player and smart speaker platform), as well as for DIY ”Assist Sattelite” early adopters and thinkers for when ESPHome get I2S audio DSP support for XMOS xCORE audio AI accelerator and other “Assist Satellite” audio, voice and media player components are added to ESPHome firmware (as well as matching “Assist Sattelite” voice assistant and media player features added to the Home Assistant core).
Seeed Studio currently sell three different kits containing different components/parts in combination with the main board, and note that you can also add speaker and a basic enclosure to each kit to get a stand-alone setup:
- Kit #1 (full developers kit) = “ReSpeaker Lite Voice Assistant Kit” with 2 Mic Array and pre-soldered XIAO ESP32S3 + a “ReSpeaker Lite 2-Mic Array” (USB board without ESP32) + Mono Speaker, + Speaker Enclosure with mounting for the board:
- Kit #2 (small developers kit) = ReSpeaker Lite Voice Assistant Kit with 2 Mic Array and pre-soldered XIAO ESP32S3 + external Mono Speaker:
- Kit #3 (DIY developers kit) = “ReSpeaker Lite 2-Mic Array” (USB board without ESP32, but compatible with Seeed Studio XIAO ESP32S3 board):
All these"ReSpeaker Lite" kits use a solution that is designed to combine an XMOS XU-316 AI (xCORE XU316) which is a dedicated DSP (Digital Signal Processor) Audio Processor Microcontroller IC chipset which can act as a audio co-processor (i.e. onboard sound-card) for all type of advanced on-device sound and voice processing with an ESP32-S3 that can be used to run ESPHome firmware to allow for Voice and Assist integration with Home Assistant integration, and they have a wiki with instructions on how to set it up (though unclear if and how that setup is utilized the XMOS chip for advanced audio processing):
- Voice Assistant System for Home Assitant | Seeed Studio Wiki
- ReSpeaker Lite Voice Assistant Kit | Seeed Studio Wiki
The XMOS xCORE DSP chip acts like sound-card co-processor adding in-line off-loading of audio noise removal (voice clean-up) from the microphone(s), like Interference Cancellation (IC), Acoustic Echo Cancellation (AEC), Noise Suppression (NS), and Automatic Gain Control, etc. and/and other audio post-processing algorithms to improve the solution’s voice recognition capabilities). (Depending on which XMOS chip they use their XCORE-VOICE framework could technically also allow also for up to 16 PDM microphones to be connected to a single xCORE device with a different PCB design).
UPDATE: I’m paraphrasing but one of the representatives that is working on that "voice-kit"project more or less wrote there that while ESPHome and Home Assistant voice developers from Nabu Casa are now focusing to work on voice assistant features they are still figuring all this around making use of external audio processors and while they are currently only testing the XMOS xCORE chip as a candidate for an ESPHome-based voice-kit reference hardware design for official Home Assistant Voice Assistant development kit they also plan to work on “audio processor” component for ESPHome with hardware-independent architecture that will not be reliant on specific hardware configurations or dependent specifically on the XMOS xCORE DSP chip but instead allow others to add support for additional DSPs as audio processors (i.e. sound co-processors) in the future, (plus the fact that they will make it so that all the I2S settings and pins are still configurable in YAML, meaning that it should at least be possible add support for DSP types to the “audio processor” component if they work similar to XMOS xCORE DSP chips, as well as different board designs that uses other I2S settings and pins). That representative also wrote; “we will add all the code to the base ESPHome project once things are stable and working well”.m and noted that ESPHome and Home Assistant / Nabu Casa developers are right now moving very fast and breaking things as they go so working on experimental code for the new voice-kit related components for ESPHome in a separate repository on GitHub here:
- GitHub - esphome/home-assistant-voice-pe: Home Assistant Voice PE (previously GitHub - esphome/home-assistant-voice-pe: Home Assistant Voice PE)
- (plus patched merged to upstream ESPHome → Issues · esphome/esphome · GitHub)
Note that these “ReSpeaker Lite” kits (all three available via the same webpage) seem to specifically be targeting ESPHome voice-assistant developers and early-adopter tinkerers, and currently looks to be available as a full development-kit that including an ESP32-S3 based board (Seeed Studio’s XIAO ESP32S3) with simple acrylic enclosure and a speaker or a smaller DIY-kit that only includes the ReSpeaker Lite board where to user will needs to add their own MCU board, SBC or other computer using I2S and/or USB connections, as well as a larger third-kit that include both those two kits. Both the boards have on-board dual-microphone array (2-Mic Array) for far-field voice control interaction, and it says that these kits support custom firmware updates via DFU-Util (though unclear if that includes firmware and/or configuration for the XMOS as well), and it also features an stereo audio output jack for connecting external (Hi-Fi) speakers.
Seeed Studio published these videos showing the stand-alone ESP32 model and the USB model:
CNX Software has a nice summary blog article with details on the technical hardware specification:
Even if XMOS is proprietary hardware they are very popular and have open-source compatible libraries:
As far as I can tell the complete source code for XMOS’s xcode-voice firmware is in sln_voice
repo:
More information about that in their user-guide for their XK-VOICE-L71 which is based on same chip:
- Voice Reference Design Evaluation Kit | XMOS
- https://www.xmos.com/download/XVF3610-User-Guide(v5_7_3).pdf
These uses XMOS’s “XCORE-VOICE” platform / software development kit software:
Now the hardware specification of these sounds very similar to that of the upcoming voicekit hardware platform that Nabu Casa developers have mentioned that they are working on, or? …though sound as if these ReSpeaker Lite kits are missing Nabu Casa’s expansion ports to allow for third-party hardware addons?
Not sure if all of you have followed news about Nabu Casa’s upcoming voice-assistant hardware project, but Paulus Schoutsen revealed on their Home Assistant’s ESPHome Summer Release Party on YouTube that Nabu Casa’s ESPHome developers and hardware developers are in fact working on an upcoming open-source voice-kit hardware platform for voice-assistant development that will based on ESP32-S3 in combination with an XMOS xCORE chip (which is a very powerful for all kinds of different audio processing).
- GitHub - esphome/home-assistant-voice-pe: Home Assistant Voice PE (GitHub - esphome/home-assistant-voice-pe: Home Assistant Voice PE)
- Voice Assistant — ESPHome
- Assist - Talking to Home Assistant - Home Assistant
- GitHub - espressif/esp-dsp: DSP library for ESP-IDF
Nabu Casa’s upcoming ”Assist Sattelite” voice-kit hardware platform was also mentioned as a new development framework under the Voice Assistants section of Home Assistant’s Roadmap 2024 Midyear Update post, and then @synesthesiam and kahrendt again also shortly talked a little about Nabu Casa’s upcoming voice-kit satellite hardware development platform during their “Voice Chapter 7” livestream video (where they confirmed that it will be based on an XMOS chip in combination with a ESP32-S3 chip on a single biard, running ESPHome it will become available as an “official” Home Assistant voice-kit reference platform made by Nabu Casa for ESPHome developers):
I believe similar xCORE chips from XMOS is by the way used in Amazon Alexa Voice Service (AVS) Development Kit(s) solutions and even in some Amazon Echo products?
- XCORE-VOICE | XMOS
- USB & Multichannel Audio | XMOS
- USB & Multichannel Audio | XMOS
- https://www.xmos.com/xmos-delivers-first-amazon-alexa-voice-service-development-kit-with-linear-mic-array-for-far-field-voice-capture/
- New XMOS Dev Kit for AVS Brings Far-Field Voice Capture to a Linear Mic Array : Alexa Blogs
- Alexa Voice Service (AVS) Device SDK
- GitHub - xmos/vocalfusion-avs-setup: Repository containing scripts/helpers for configuring a Raspberry Pi to work with XMOS mic frontend
…and while I am unsure of it I would suspect that Google Nest / Google Home smart speaker series also contain XMOS xCORE chip?
- https://www.xmos.com/fully-offloaded-giving-smart-tvs-the-voice-power-they-deserve/
- https://www.xmos.com/making-smart-speakers-feel-at-home/
Anyway, regarding this, I posted my hardware/software wishist for ESPHome and Home Assistant here:
First of all, I wish that there could be a native HiFi quality music players inside ESPHome (or Home Assistant) that would fully integrated with Music Assistant so that you could simply set these speakers as “Player Provider” inside Music Assistant, as well as allow to group speakers for syncronized multi-room playback.
Thus wondering if ESPHome voice assistent combined hardware and firmware platform will also be great for music playback if high-quality amplifier and speakers are used?
Any work being done to also make ESPHome based voice-assistant devices better media player recievers with native support for featues such a multi-room and syncronized Hi-Fi quality playback?
I am hoping that since Nabu Casa’s designs it said to be open-source hardware and XMOS integration will probably be added to the ESPHome’s Media Player Components (and Microphone Components) I for one am hoping that it could and will be extended to different types of speakerless solutions with appliance solutions with AUX-output/audio-output and AUX-input/audio-input port and not only for voice-assistant.
Personally I would also love to see inexpensive speakerless network-streamer player/receiver hardware without microphones but only with with AUX-out that can connect to any of your existing amplifiers or speakers with built-in amplifiers in order to replace products like Chromecast Audio and Amazon Echo Input / Echo Link Amp, (e.i. devices with no on-board speakers that must be connected to external speakers for audio output (AUX-output).
That is, I am sure that not everyone only wants “smart speakers” with voice-assistant and that instead many would be also happy to have network streamers/players without microphone which only purpose is to receive and output highest quality audio possible from Music Assistant to your “dumb” speakers.
I for one still have loads of Chromecast Audio audio-only receivers connected to various models and brands of different speaker/reciever systems in each room used to achieve multi-room music playback on a budget (because could not afford Sonos speakers in all rooms).
So even if though Nabu Casa’s hardware will initially primarly be designed for “Home Assistant Satellite” (also known as “Wyoming Satellite”) for voice-assistant appliances, such open-source hardware it just like the ESPHome firmware does have a lot of potential for different use cases.
Also on my wishlist if a network streamer receiver hardware with AUX-input and ADC to get music from analog audio source. As an easy way to achieve a remote AUX input into Music Assistant from an external analog audio source like a vinyl record player (LP turntable) or cassette player.
What I want to achieve is a solution that is easy to install/maintain and use that allow my wife to stream music from a vinyl record player (LP turntable) to any speaker or group of speakers in our home. The vinyl record player (turntable) setup she has a pre-amp with phono (RCA) output ports for analog audio in stereo.
- Architecture example: Analog audio source with preamp → ADC network appliance → music stream → Music Assistant → Any speakers
I would therefore prefer if we could buy some kind of networked (Wi-Fi) enabled appliance like a music streamer with stereo AUX input port that it will use for on-the-fly perform analog-to-digital conversion (ADC) + encoding for streaming to a Music Provider inside Music Assistant.
I do however think that both such a solution does need its own non-propriatory audio-only streaming protocol for high-quality music streams?