"ReSpeaker Lite" - new Seeed Studio Voice Assistant Development Kit hardware combine ESP32 with XMOS XU316 DSP chip for advanced audio processing as a ESPHome-based Home Assistant Assist Satellite voice devkit

Where do they mention full duplex i2s? They might have usefully information in that doc.

I see they mention full duplex AEC, so the xmos is getting send recv from moc/speaker. Just need to figure out how the xmos communicates with the esp32. It would be strange of they only have half duplex from xmos to esp32.

I asked in their Discord. The answer is here: Discord
Here’s link to illustration image, if you don’t have access to Discord:

Hi!
Just saw this thread and that you are talking about I2S.

I am using the XMOS eval kit XK-VOICE-L71 here together with a RPi 3A+.
Here the I2S connection betwen the RPi and the XMOS chip is full duplex using DIN/DOUT and only one(!) set of clock signals (BCLK,LRCLK).
The RPi is the I2S master and the XVF3610-INT firmware is configured for I2S slave.
Fixed sampling freq. of 48kHz and 32-bit samples.

Recently I wanted to try out the new Nabu Casa XMOS firmware on my XK-VOICE-L71 that can be found here:

However, after studying this file it became clear that the firmware cannot be used on my eval kit and thus probably any other board that is similar to this.

Why?
The esphome voice-kit XMOS firmware uses two distinct I2S connections (each in half duplex). I presume that they do this to prevent unecessary sample rate conversions for the voice audio data.
The voice DSP audio pipeline only works with 16kHz mic data and so does the microWakeWord stuff.
However, for the audio playback (music) they want to use 48kHz to have a good quality.
To achieve this they have implemented a second I2S interface in the XMOS firmware which has complete separate connections to the ESP32-S3 including distinct clock signals (bclk, lrclk). See the GPIO definitions in their yaml file.
Also they modified the XMOS firmware to be I2S master on both peripherials so both sets of clock signals are sent to the ESP32-S3 which the I2S slave.

4 Likes

Oh hell yeah, so esp32 is full duplexed. I’ll need to take time and research a bit before I can say anything definitive. That’s great news though, I was honestly expecting a let down :joy:

2 Likes

Yes, ESP32-S3 is full-duplex.
See paragraph “full-duplex” here on this page: Inter-IC Sound (I2S) - ESP32-S3 - — ESP-IDF Programming Guide v5.2.3 documentation

Yeah, that’s good news. I expected it to be duplex, 'cause other ways it would be exclusive sharing, which is bummer.
Please share your findings if you can. :slight_smile: I guess you know waaay more about topic that I do.
I’m just trying to use Gnumpi esp-adf lib with it, to have media_player and micro-wakeword working. If i can help in any way - I’m here.

@gnumpi slightly off-topic but very curious to hear if you are planning on trying to push your work on Custom Audio Components to upstream ESPHome for mainlining (including what others like @nielsnl68 contributed to your esphome_audio repository?)?

With that question asked I am assuming that you guys are already aware of the synergies between your audio improvements and the related ESHome audio and media player features for remote Assist
Sattelites in ESPHome in the “home-assistant-voice-pe” repository (formerly named ESPHome voice-kit)? See:

As well as the new “Assist Sattelite” that is now in beta and will come in upcoming HA 2024.10 release as a building block for ESPHome based remote sattelites devices that use Assist(?), all in preparation for Nabu Casa’s upcoming Assist Sattelite hardware:

That means their firmware will be using distinct i2s connections, but Respeaker is using duplex.
I guess it could be fine, probably? As all the pre-processing for microphone is happening on XMOS itself, we can just connect that i2s mic and speaker as usual i2s duplex devices. Right? So the code on ESP side will differ in part of i2s configuration, but current i2s firmware from Seeed should work.

Please correct me if i’m wrong. This board is black box to me, so i might be over-simplifying.

@gnumpi is silent for several months, as far as i know.

New work in ESPHome for PE is really huge. There’s also a lot of new things for esp-adf AFAIK. And all of that will be conflicting with adf-pipeline i’m using now. :slight_smile:
Should i probably wait for release and try to use that code instead of gnumpi’s?

@formatBCE
The most interesting part of your posted block diagram is missing: the clock signals!


Could you ask them again to provide infos about any bclk and lrclk that might exist?

My bet: they only use ONE set of clock signals (as used in the reference HW design).
So only ONE bclk and only ONE lrclk.
If this is the case, then I am afraid: you won’t be able to use the esphome XMOS firmware as it is NOT compatible because of incompatible physical connections!
(Current dev branch of XMOS firmware)

The XMOS reference design assumes a single full-duplex I2S connection with the same sample rate in both directions. Either 16kHz or 48kHz, but then for both directions.

@gnumpi Seems to be pretty busy:

:wink:

2 Likes

The i2s full-duplex is nice to know but seems like that will be for future models only as this doesn’t appear to support it.

I’m almost tempted.to order.a.Jetson now, use it and return it to get my money back now that they have ha-core and all the voice stuff ported to GPU based (voice stuff at least, ARM CPU )but I would never keep it. I just want to test it to see how well Seeed, Nvidia and HA did (Seeed is a Jetson reseller and they started the Nvidia dev thread) until they got it all working. Nvidia is just too expensive and they keep increasing the Jetson prices. I don’t even really care about LLM’s that much although they would be tested.Just 100 percent local voice as I still use Nabu cloud due to better accuracy and I trust them with my data.

I’m about positive any Seeed demos.you have seen are hooked up to a Jetson model. All.demo videos I have seen are using a PC/SBC of some kind off camera. All the demos I have.seen are.

Yes, for sure. Duplex means they use single i2s slot. If you curious, it’s these pins:

i2s_lrclk_pin: GPIO7
i2s_bclk_pin: GPIO8
i2s_mclk_pin: GPIO9

But i don’t think we should use their firmware for XMOS. Seeed firmware is working fine with i2s, reportedly. So all we need is just use that duplex to initialize player and mic - basically same way it’s done on S3-Box firmware (that device is using duplex too).

Sorry, just saw the posted YAML from above:

i2s_audio_xiao:
  i2s_lrclk_pin: GPIO7
  i2s_bclk_pin: GPIO8
  i2s_mclk_pin: GPIO9

microphone:
  - platform: i2s_audio_xiao
    i2s_din_pin: GPIO44
   [...]
speaker:
  - platform: i2s_audio_xiao
    i2s_dout_pin: GPIO43
   [...]

So no need to ask. Single set of clocks. So firmware usage from here NOT possible at the moment:

How does Jetson rely to Respeaker Lite? :frowning:
Full duplex for i2s is indeed working. In case of Respeaker, we want to know the i2c addresses and types of dac/adc, AFAIK.

Stop for a second, and read what i wrote. :slight_smile:
Do we actually need XMOS firmware from PE project? It’s two different devices all in all.
Respeaker Lite has its own i2s XMOS firmware. So it does expose i2s already. No need to re-flash.
What we need is ESPHome part (ESP32-S3 firmware, not XMOS).

@formatBCE
No, you do not need it, but it would have been very nice to use it.
My fear is the following: there is a reason why the PE project devs have chosen to use two I2S connection with different sample rates.
The question is: why?
Might it be that the ESP32-S3 with local wakeword is too overloaded if it also has to do sample rate conversion?

For example in my use-case I definitely want HiFi-like music playback. So I have to use 48kHz to suffice this requirment. That means that the mic data is also sent to the ESP32-S3 with 48kHz. For microWakeWord, we have to resample in good quality to 16kHz again.

Well, i’d say it’s way more simple to use different i2s slots code-wise.
I doubt it comes to big trouble with duplex. For now, ESP32-S3-BOX-3 is one of the best regarding the wake word recognition - and it’s using duplex i2s.
I’d say HA devs decided to future-proof their device, and probably make a way to use high-quality music output.
On Respeaker, using default YAML, i heard answers from my TTS - and quality is acceptable. I can live with it. So i guess it should be fine. Not the best - but fine.

For ESP32-S3-BOX-3: which wake word engine are they using?

it’s way more simple to use different i2s slots code-wise.

But definitly not in the XMOS firmware. Here they have chosen the harder way. :wink: