XS3 inline assembly tricks

Technical questions regarding the XTC tools and programming with XMOS.
susnak
Member
Posts: 13
Joined: Fri Apr 12, 2019 1:01 pm

XS3 inline assembly tricks

Post by susnak »

Dear all,

I hope someone could point me in the right direction.

I've been using XS2 inline assembly in some of my own dsp code following lib_dsp as a reference. There are some neat tricks to be found there.
Looks like the lib_dsp is not using any of the new XS3 instructions. Is maybe lib_xcore_math intended to replace lib_dsp for the XS3 architecture?
lib_xcore_math seems to contain only *.S assembly files, which is bit hard for me to understand.

Is there anything like lib_dsp using inline assembly for XS3?


User avatar
fabriceo
XCore Addict
Posts: 186
Joined: Mon Jan 08, 2018 4:14 pm

Post by fabriceo »

hi there, my 2 cents, when it comes to dsp applications, there is no difference between xs2 and xs3 appart 32 bits floating points capabilities and Vector processor unit of course;
using FMUL and FADD in inline assembly is not a problem but as far as I can see the XC compiler is good enough for floating points routines.
Using VPU is another story, mainly because the data model is different with the exponent being managed separately of the mantissa, a bit like floating points.
also using the VPU means complete rewriting of the library to compute multiple value in parallel instead of sequences.
the very best example is the biquad, which for XS2 is iterative and easy to understand but with lib_xcore_math/src/arch/xs3/filter/filter_biquad_s32.S it is a real brainstorming. the result is a very smart piece of code with 8 biquad in a raw for a single sample with an incredibly low number of cpucycle.

difficult to find a balance between lib_dsp and lib_xcore_math... my idea was to augment lib_dsp with some of the lib_xcore_math features but conversion takes time.
let us know your findings
susnak
Member
Posts: 13
Joined: Fri Apr 12, 2019 1:01 pm

Post by susnak »

Hi fabriceo. Thank you. In the meantime, I had a look at few of the pure-assembly files in lib_xcore_math using VPU and I think I am starting to understand it. I also tried to use the VPU instructions in inline assembly, but it is not clear to me whether the compiler can generate a dual issue code form it.
For the context, I am now looking into sample rate conversion using the VPU instructions. Currently, lib_src doesn't use VPU for SSRC. Are there any plans for it?
User avatar
fabriceo
XCore Addict
Posts: 186
Joined: Mon Jan 08, 2018 4:14 pm

Post by fabriceo »

Hi susnak

regarding dual issue, that is as simple as writing your 2 instructions in bracket inside the inline statement:
asm (" { ldc r1,0 ; ldc r2,1 } ");
of course you have to take care of M+R and M&R stuff to combine your instructions.
the compiler doesn't care of what you write but the assembler will verify it.

let me give you a trick, you can see what is generated in the assembly .s file by adding an asm volatile("#mycomment"); compiling, and then searching "#mycomment" in the whole workspace, this will show you the list of files containing it in the .build folder. just click on the .s file and you ll see what the compiler generated for its C and for your asm statement.

when it comes to SRC, most of the cpu is used for FIR filtering the signal. So you could replace the FIR routines with the optimized ones in lib_xcore_math which are nearly 8 time faster. probably there are difficulties due to optimization for the circular buffer.
for the SRC coefficient computation, I m not sure if the VPU will be of any help.
susnak
Member
Posts: 13
Joined: Fri Apr 12, 2019 1:01 pm

Post by susnak »

Thank you. This is very helpful.
I wrote my own SSRC routines with vector dot products being the most demanding part. This is what I want to rewrite using VPU. The dot product in lib_xcore_math seems to be doing more than needed for FIR filtering.
User avatar
Ross
XCore Expert
Posts: 968
Joined: Thu Dec 10, 2009 9:20 pm
Location: Bristol, UK

Post by Ross »

Please do let us know if you publish your work anywhere :)