Page 2 of 2

Re: Inline assembler - making sense of operands

Posted: Fri Jun 04, 2021 9:44 pm
by CousinItt
Thanks akp, I'm doing that now and it's much simpler.

Re: Inline assembler - making sense of operands

Posted: Thu Jul 15, 2021 4:27 pm
by akp
How did you make out? Did you achieve the performance you wanted? I have found that even with a good algorithm the compiler is not amazing at efficiently dual issuing. So it's often possible to achieve almost double performance vs the compiler by optimizing the inner loop in assembly.

Re: Inline assembler - making sense of operands

Posted: Fri Aug 06, 2021 3:52 pm
by segher
CousinItt wrote: Tue Jun 01, 2021 11:04 am According to the xCore200 architecture document, maccs should take four operands, two of which are both inputs and outputs. However, in the snippet below, two constants appear as additional input parameters. What's going on here? Is this indicating to the compiler to insert additional instructions to load the constants into the registers? Is this explained in any documentation?

Code: Select all

int32_t dsp_filters_biquad
(
    int32_t        input_sample,
    const int32_t* filter_coeffs,
    int32_t*       state_data,
    const int32_t q_format
) {
    uint32_t al; int32_t ah, c1,c2, s1,s2;
    ...
    asm("maccs %0,%1,%2,%3":"=r"(ah),"=r"(al):"r"(input_sample),"r"(c1),"0"(0),"1"(1<<(q_format-1)));
    ...
    return ah;
}
The constraint "0" means this uses the same register as operand 0. Similar for "1".
Since your asm writes to "ah" and "al" anyway, you can write this more clearly as

Code: Select all

    ah = 0;
    al = 1 << (q_format-1);
    asm("maccs %0,%1,%2,%3" : "+r"(ah), "+r"(al) : "r"(input_sample), "r"(c1));
("+" means "both input and output").

Re: Inline assembler - making sense of operands

Posted: Sat Aug 07, 2021 3:56 pm
by CousinItt
Sorry akp, just saw your comment. Performance is better than the compiled code, but I wouldn't say it was great, in the sense that there is still a lot of overhead with shuffling of data and other intermediate calculations between maccs. It's just the nature of the algorithm. For now I can live with the performance, but I may revisit it if I have a flash of inspiration.

Thanks also segher for the clarification.