Using a XMOX xcore.ai with an ESP32-S3

Alextrical · Post by **Alextrical** » Sun Mar 24, 2024 9:59 pm

Hi all

I'm currently investigating the viability of using a XMOS chip for the Microphone array and Audio reproduction for use in a Smart Speaker, for use in self hosted Voice assistant projects such as Rhasspy, Willow or ESPHome.
The XU316 looks like a fantastic chip that could solve a few of the issues with the hardware used by these projects, however I'm wondering what possible ways there are to connect the XU316 to the ESP32-S3 module?
I'm ideally intending for this to become a open source project that will benefit the communities above

Kind regards

fabriceo · Post by **fabriceo** » Mon Mar 25, 2024 10:48 am

Hi
a simple answer to start : ESP32 can exchange information (setting, state, status) over I2C or SPI without issue.
if you want to transfer filters/taps coefficient, then SPI will be definitely better.
both library exists on xmos.
Xmos requires a firmware in its local flash device.
if you want to update this format OTA, you will have to go with SPI.
for exchanging audio data I2S is the way forward, ESP32 can be master or slave.
Audio master clock can be provided by xu316 pll in both cases.
hope this helps

Alextrical · Post by **Alextrical** » Mon Apr 08, 2024 6:22 pm

Hi Fabriceo

Sorry for the delayed response.
Thank you, I will start drafting up a schematic for the device.
SPI definitely sounds like the way to go.

I believe the initial prototype will include a SPI Flash, but do I understand the data sheet (https://www.xmos.com/download/XU316-102 ... et(26).pdf) correctly, in section 9.3 that it may be possible to omit the SPI flash, and clock the boot image in directly from the ESP32?
In short meaning we wouldn't have to worry about updating the Flash of the XMOS device on software updates, but just update the ESP32 firmware and reboot both chips?

The other worry I have heard voiced from the community I'm working with is the available algorithms, and if they are closed source or locked behind pay walls? Specifically the intent would be to use the XU316 as a DSP for a microphone array, with the hopes to use the following AEC, AGC, Beam-forming and hopefully Keyword Spotting.

fabriceo · Post by **fabriceo** » Tue Apr 09, 2024 9:09 am

Hello
Yes regarding your first point about booting from flash, you are right it is possible to configure the XMOS so that the boot sequence is not from an external flash device in SPI master, but the boot code can be provided from ESP to XMOS by sending SPI sequences. This is documented and you could find some piece of code on the forum as some users did experiment that recently.

But unless you plan to produce millions of devices, I would not recommend this procedure, especially regarding constrains during the product development phase.
Developing the XMOS application, even if you leverage a standard one from XMOS, will be an effort requiring try and tests and you will certainly want to have this part working fine before mixing with your ESP application.

my suggestion is that you keep the XU316 with its own local flash device (requires 6 pins in total) and then you connect the ESP to the XMOS via a dedicated SPI bus (4 wires). Then you develop your XMOS app with the XTAG and XMOS tools and libraries , independently. Then you develop your ESP app , and you integrate as part of your communication protocol between the 2 chips, special sequences to be able to "download" an XMOS image from ESP and then you add code in the xmos application to write it to flash (with the xmos library which contains standard procedures to write boot image in flash). Also if you expose the XMOS usb bus somewhere on the PCB, you could update the xmos firmware independently via usb dfu protocol.

also be careful with the size of each images (xmos and esp). The provided compression tools are not working well for big files...

regarding IP and licence for the voice/mic solution, I have no direct experience but I understood that XFV3800 is typically a version of XU316 which enable the possibility to use the voice solution. here you should raise a support ticket to xmos to better understand limitations and possibilities.

hope this helps

Post by **Ross** » Wed Apr 10, 2024 12:21 pm

the ESP32-S3 has I2S, no?

Alextrical · Post by **Alextrical** » Wed Apr 10, 2024 8:05 pm

fabriceo wrote: ↑Tue Apr 09, 2024 9:09 am Yes regarding your first point about booting from flash, you are right it is possible to configure the XMOS so that the boot sequence is not from an external flash device in SPI master, but the boot code can be provided from ESP to XMOS by sending SPI sequences. This is documented and you could find some piece of code on the forum as some users did experiment that recently.

Thank you for confirming, I believe Ive seen the thread you are referencing, It looks promising for future trimming of the BOM if the device is successful

fabriceo wrote: ↑Tue Apr 09, 2024 9:09 am I would not recommend this procedure, especially regarding constrains during the product development phase.... you will certainly want to have this part working fine before mixing with your ESP application.

That makes sense, the local flash will understandably help speed up development, and remove extra unknowns for the initial tests, It will also allow the possibility of using a board as a stand alone USB Mic array, that may or may not be something the target audience is interested in.

fabriceo wrote: ↑Tue Apr 09, 2024 9:09 am you should raise a support ticket to xmos to better understand limitations and possibilities.

Good advice, i will track down the contact details, and fire a support ticket over to them. I will help ease the worries of some of the developers, though the hope would likely be to develop some open source algorithms in the future. Just having a free to use propriety blob would make initial onboarding easier.

Ross wrote: ↑Wed Apr 10, 2024 12:21 pm the ESP32-S3 has I2S, no?

Yes it does, initial testing was using the vector processing onboard the unit to do some DSP using an external ADC, though its looking like we are reaching the capacity of what the ESP32-S3 can achieve, while still trying to allow for additional functions.

Thank you both, this has been really useful.

My next question is in regards to using 3.3v logic level, I'm currently basing the design around the reference file for the XK-VOICE-L71 using the XU316-1024-QF60A and level shifters. If i swap to the XU316-1024-QF60B and supply the VDDIO L, R and T with 3.3v, can I omit the level shifters when connecting to a host such as a ESP32-S3 or RPi?
(sorry for the somewhat lazy question, I've not had time this evening to check that all the comms lines are only connected to the Top, Left and Right pins)

Joe · Post by **Joe** » Thu Apr 11, 2024 8:38 pm

"My next question is in regards to using 3.3v logic level, I'm currently basing the design around the reference file for the XK-VOICE-L71 using the XU316-1024-QF60A and level shifters. If i swap to the XU316-1024-QF60B and supply the VDDIO L, R and T with 3.3v, can I omit the level shifters when connecting to a host such as a ESP32-S3 or RPi?"

Yes correct. QF60B = 3V3 IO, QF60A = 1V8 IO. TQ128 and BGA you can select which supply you want for each bank individually. For QF60 all banks have to be the same, either 1V8 or 3V3 based on part number.

Alextrical · Post by **Alextrical** » Sun Apr 14, 2024 7:41 am

Joe wrote: ↑Thu Apr 11, 2024 8:38 pm Yes correct. QF60B = 3V3 IO, QF60A = 1V8 IO... For QF60 all banks have to be the same, either 1V8 or 3V3 based on part number.

Thank you, that makes my life easier, It means that i can route the board without the need for level shifters, as all communication logic can be run on 3.3v

Alextrical · Post by **Alextrical** » Sun Apr 14, 2024 8:09 am

I'm looking to populate 2 different array layouts, for testing the layouts, both on a circumference of a circle
a 3 Mic array, 120degree spacing
and a 4 mic array, 90degree spacing.
Is the radius for both topologies the same? would a radius of 31.589375mm be correct?

Effectively this would be 6 microphones.
My issue is that the design I'm working on is based on the XK-VOICE-L71 as its reference design, but i need to connect up 2 additional mic inputs.
IO Tile 1 is full, but there is capacity on IO Tile 0, do I need to move over MIC_CLK and the existing MIC_DATA to IO Tile 0, or can they be split across tiles?
Does MCLK also have to be moved to IO Tile 0?

Can the clock for all 6 mics be driven from a single output from a 74LVC125A or do I need to drive 1 pair from a single output. (as I've seen in the xCORE Microphone Array https://www.xmos.com/download/xCORE-Mic ... l(2V0).pdf U16 on page 19)

Using a XMOX xcore.ai with an ESP32-S3

Using a XMOX xcore.ai with an ESP32-S3

Re: Using a XMOX xcore.ai with an ESP32-S3

Re: Using a XMOX xcore.ai with an ESP32-S3

Re: Using a XMOX xcore.ai with an ESP32-S3

Re: Using a XMOX xcore.ai with an ESP32-S3

Re: Using a XMOX xcore.ai with an ESP32-S3

Re: Using a XMOX xcore.ai with an ESP32-S3

Re: Using a XMOX xcore.ai with an ESP32-S3

Re: Using a XMOX xcore.ai with an ESP32-S3