XS3 inline assembly tricks

susnak · Post by **susnak** » Thu Nov 23, 2023 1:21 pm

Dear all,

I hope someone could point me in the right direction.

I've been using XS2 inline assembly in some of my own dsp code following lib_dsp as a reference. There are some neat tricks to be found there.
Looks like the lib_dsp is not using any of the new XS3 instructions. Is maybe lib_xcore_math intended to replace lib_dsp for the XS3 architecture?
lib_xcore_math seems to contain only *.S assembly files, which is bit hard for me to understand.

Is there anything like lib_dsp using inline assembly for XS3?

fabriceo · Post by **fabriceo** » Mon Nov 27, 2023 3:02 pm

hi there, my 2 cents, when it comes to dsp applications, there is no difference between xs2 and xs3 appart 32 bits floating points capabilities and Vector processor unit of course;
using FMUL and FADD in inline assembly is not a problem but as far as I can see the XC compiler is good enough for floating points routines.
Using VPU is another story, mainly because the data model is different with the exponent being managed separately of the mantissa, a bit like floating points.
also using the VPU means complete rewriting of the library to compute multiple value in parallel instead of sequences.
the very best example is the biquad, which for XS2 is iterative and easy to understand but with lib_xcore_math/src/arch/xs3/filter/filter_biquad_s32.S it is a real brainstorming. the result is a very smart piece of code with 8 biquad in a raw for a single sample with an incredibly low number of cpucycle.

difficult to find a balance between lib_dsp and lib_xcore_math... my idea was to augment lib_dsp with some of the lib_xcore_math features but conversion takes time.
let us know your findings

susnak · Post by **susnak** » Tue Nov 28, 2023 12:57 pm

Hi fabriceo. Thank you. In the meantime, I had a look at few of the pure-assembly files in lib_xcore_math using VPU and I think I am starting to understand it. I also tried to use the VPU instructions in inline assembly, but it is not clear to me whether the compiler can generate a dual issue code form it.
For the context, I am now looking into sample rate conversion using the VPU instructions. Currently, lib_src doesn't use VPU for SSRC. Are there any plans for it?

fabriceo · Post by **fabriceo** » Wed Nov 29, 2023 2:59 pm

Hi susnak

regarding dual issue, that is as simple as writing your 2 instructions in bracket inside the inline statement:
asm (" { ldc r1,0 ; ldc r2,1 } ");
of course you have to take care of M+R and M&R stuff to combine your instructions.
the compiler doesn't care of what you write but the assembler will verify it.

let me give you a trick, you can see what is generated in the assembly .s file by adding an asm volatile("#mycomment"); compiling, and then searching "#mycomment" in the whole workspace, this will show you the list of files containing it in the .build folder. just click on the .s file and you ll see what the compiler generated for its C and for your asm statement.

when it comes to SRC, most of the cpu is used for FIR filtering the signal. So you could replace the FIR routines with the optimized ones in lib_xcore_math which are nearly 8 time faster. probably there are difficulties due to optimization for the circular buffer.
for the SRC coefficient computation, I m not sure if the VPU will be of any help.

susnak · Post by **susnak** » Wed Nov 29, 2023 4:07 pm

Thank you. This is very helpful.
I wrote my own SSRC routines with vector dot products being the most demanding part. This is what I want to rewrite using VPU. The dot product in lib_xcore_math seems to be doing more than needed for FIR filtering.

Post by **Ross** » Fri Dec 01, 2023 11:15 am

Please do let us know if you publish your work anywhere :)

XS3 inline assembly tricks

XS3 inline assembly tricks

Re: XS3 inline assembly tricks

Re: XS3 inline assembly tricks

Re: XS3 inline assembly tricks

Re: XS3 inline assembly tricks

Re: XS3 inline assembly tricks