Far-field microphone array HAT with some extra futures

spchouse · October 3, 2024, 10:48am

Hi All,
Here is our approach to the subject.
HW is based on XMOS far-field voice processor and is in the form of hat for Radxa/RPi/OrangePi.
At the moment we are mainly focusing on Radxa Zero 3E because it has a built-in Ethernet and camera support.
Since the main assumption is to work as a far-field microphone but also an advanced alarm sensor, we gave up ESP32 in favor of more powerful processors based on Linux.
One of the assumptions is to support a high-resolution cameras for motion/face detection.
Main product features:

Based on XMOS far-field voice processor support AEC, and beam-forming
Best in class PDM 4x microphones array
6W Hi End DAC to connect directly to a speaker.
3 ways of supply:
- build in wide range DC/DC converter that supply voice HAT as well as Radxa/RPi/OrangePi
- can be supplied over USBC (due to current limitations not all options may be available). Can work as a separate USB device witout bottom Radxa/RpI/Orange it is detected in Win/Linux as a far-field microphone
- option to work with POE at the moment support for popular Hi power PoE modules from Aliexpress type RT5400
Built-in speaker so it can work as a Media Player Device
Socket for mounting popular microwave sensor HLK-LD2450.
Priority is security and stability so we gave up WiFi in favor of a cable LAN connection. But after adding a WiFi card to Radxa Zero 3E or using directly built-in WiFi in RPi/Orange zero works over WiFi
Camera cutout for full RPi.
XMOS programming connector.
Support for popular standards such as local openwakeword and wyoming satellite or anything else that can be lunched on RAdxa/Ri/Orange. We use I2S and it is recognized in Linux as ALSA play/record device

20240924_0940041920×1495 176 KB

nanosonde · October 3, 2024, 11:00am

Great News!

Which XMOS chip are you exactly using?
Do you plan to use the code from here or do you want to use the XVF3800 firmware binaries? In the latter case I would assume that you would have to use this variant.

spchouse · October 3, 2024, 11:07am

Hi @nanosonde, yes there is XFV3800 onboard and we have a modified XMOS firmware as we use different DAC IC.

nanosonde · October 3, 2024, 11:20am

Just curious: is it the XVF3800 production firmware source code one can only access if some prove of buying matching silicon or dev kit has been provided: VocalFusion Software Request Form | XMOS

spchouse · October 3, 2024, 11:22am

Yes thats correct.

nanosonde · October 3, 2024, 11:26am

I assume it is quite similiar to this one here, right?

Do you plan to sell this? Any price info yet? When?

spchouse · October 3, 2024, 11:40am

Yes but to keep some compatibility for now we do not use MCLK generated on host. Switching between three popular hosts platform is by soldering bridge resistors.

And yes we have a plan to sell it. We have few prototypes we tests over last few weeks and so far we are happy from results.

Price woul be around 89Euro

nanosonde · October 3, 2024, 12:29pm

Ok. So MCLK is generated by the XMOS chip.
Is the XMOS I2S master or slave with respect to the host I2S imterface?

What about the I2C? Is the XMOS I2C slave so that the host can control it?

nanosonde · October 3, 2024, 12:31pm

And talking about XMOS USB?
Do you use a USB Multiplexer to switch between external USB connector and the USB host interface from the SBC?

spchouse · October 3, 2024, 12:51pm

As for I2S it is a slave for SBC. MCLK is generated by XMOS. We tried to be master but it didnt work properly on ALSA. Only two separate record and play device but that on other hand did not work correcly with wyoming satellite and openwakeword.

I2C is a slave for SBC and can be used to control XMOS as well as FW updates.

spchouse · October 3, 2024, 12:56pm

USBC is connected directly to XMOS without multiplexer to SBC. So HAT itself without SBC can be a separate far-field mic and speaker for any other devices like win/linux PCs.

nanosonde · October 3, 2024, 1:38pm

Do you use a device tree overlay to create the simple I2S sound card as I2S master with external MCLK from XMOS?

I use this one here:

However, MCLK is generated by the RPi.
This needs manual clock setup as can be seen in the XMOS vocalfusion rpi setup repo.

So no special kernel modules required.

spchouse · October 3, 2024, 2:03pm

Yes there is a custom DTS RPi, similar for Radxa Zero 3E. Shortly I will publish github with all the patches needed.

spchouse · October 3, 2024, 4:23pm

@nanosonde what HW platform you are using for XMOS?

nanosonde · October 3, 2024, 4:50pm

Currently the XK-VOICE-L71 attached to a Raspberry Pi 3A+ which I had still lying around. I am running raspios-bookworm.
Initially, I had some plans to build my own PCBs with KiCad8, probably with separate Mic-Array- and XMOS-PCBs. Possible manufactured by JLCPCB.
But since a few boards of other people were spotted in the wild already(incl. yours), I will wait a bit and see if somebody comes up with such a board(s)/solutions.
Long-term plan would be to build some simple replacement PCB(s) which could replace the already existing PCB(s) in commercial active/smart speakers.

Maybe it is the time to design a general purpose XMOS breakout PCB with seperate mic-array board so that it could be fitted on other PCBs easily.

nanosonde · October 4, 2024, 2:37pm

@spchouse
Any reason why you have chosen XVF3800 over XVF3610?
XMOS themselves recommend XVF3800 with 4 mics and beam-forming for products in the area of conference speakers with beam-forming, etc.
The XVF3610 would be sufficient for voice assistants.
I still wonder though how the XVF3610 firmware differs from the “open-source” XCORE-VOICE" (sln_voice repo) with respect to performance related to the ASR stream (cleaned from echos and the reference audio).

Have you evaluated those?

spchouse · October 4, 2024, 5:02pm

Mainly because of beam forming and far-field solution. If board is mounted somwhere in the corner of the room or in ceiling as a security sensor that is a must have.