Inline assembler - making sense of operands

Technical questions regarding the xTIMEcomposer, xSOFTip Explorer and Programming with XMOS.
User avatar
CousinItt
XCore Addict
Posts: 255
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

Thanks akp, I'm doing that now and it's much simpler.


User avatar
akp
XCore Expert
Posts: 544
Joined: Thu Nov 26, 2015 11:47 pm

Post by akp »

How did you make out? Did you achieve the performance you wanted? I have found that even with a good algorithm the compiler is not amazing at efficiently dual issuing. So it's often possible to achieve almost double performance vs the compiler by optimizing the inner loop in assembly.
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

CousinItt wrote: Tue Jun 01, 2021 11:04 am According to the xCore200 architecture document, maccs should take four operands, two of which are both inputs and outputs. However, in the snippet below, two constants appear as additional input parameters. What's going on here? Is this indicating to the compiler to insert additional instructions to load the constants into the registers? Is this explained in any documentation?

Code: Select all

int32_t dsp_filters_biquad
(
    int32_t        input_sample,
    const int32_t* filter_coeffs,
    int32_t*       state_data,
    const int32_t q_format
) {
    uint32_t al; int32_t ah, c1,c2, s1,s2;
    ...
    asm("maccs %0,%1,%2,%3":"=r"(ah),"=r"(al):"r"(input_sample),"r"(c1),"0"(0),"1"(1<<(q_format-1)));
    ...
    return ah;
}
The constraint "0" means this uses the same register as operand 0. Similar for "1".
Since your asm writes to "ah" and "al" anyway, you can write this more clearly as

Code: Select all

    ah = 0;
    al = 1 << (q_format-1);
    asm("maccs %0,%1,%2,%3" : "+r"(ah), "+r"(al) : "r"(input_sample), "r"(c1));
("+" means "both input and output").
User avatar
CousinItt
XCore Addict
Posts: 255
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

Sorry akp, just saw your comment. Performance is better than the compiled code, but I wouldn't say it was great, in the sense that there is still a lot of overhead with shuffling of data and other intermediate calculations between maccs. It's just the nature of the algorithm. For now I can live with the performance, but I may revisit it if I have a flash of inspiration.

Thanks also segher for the clarification.