Inline assembler - making sense of operands

Technical questions regarding the xTIMEcomposer, xSOFTip Explorer and Programming with XMOS.
User avatar
CousinItt
Respected Member
Posts: 256
Joined: Wed May 31, 2017 6:55 pm

Inline assembler - making sense of operands

Post by CousinItt »

For performance reasons I need to implement a filter in assembly language, targeting xCore-200.

I've been looking at the lib_dsp biquad implementation, and I can't understand how the inline assembler syntax converts some instructions into assembly language.

According to the xCore200 architecture document, maccs should take four operands, two of which are both inputs and outputs. However, in the snippet below, two constants appear as additional input parameters. What's going on here? Is this indicating to the compiler to insert additional instructions to load the constants into the registers? Is this explained in any documentation?

Thanks.

Code: Select all

int32_t dsp_filters_biquad
(
    int32_t        input_sample,
    const int32_t* filter_coeffs,
    int32_t*       state_data,
    const int32_t q_format
) {
    uint32_t al; int32_t ah, c1,c2, s1,s2;
    ...
    asm("maccs %0,%1,%2,%3":"=r"(ah),"=r"(al):"r"(input_sample),"r"(c1),"0"(0),"1"(1<<(q_format-1)));
    ...
    return ah;
}


User avatar
CousinItt
Respected Member
Posts: 256
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

Partly answered my own question. It does look like the compiler is setting up the accumulator registers - see below. I'd still like to know where this is described, if anywhere.

Code: Select all

<dsp_filters_biquad>:
	...
        0x00040114: bd 98:       sub (2rus)      r11, r3, 0x1
        0x00040116: 11 a7:       mkmsk (rus)     r4, 0x1
        0x00040118: f3 25:       shl (3r)        r11, r4, r11
        0x0004011a: 00 69:       ldc (ru6)       r4, 0x0
        0x0004011c: 81 fa eb 0f: maccs (l4r)     r4, r11, r0, r5
        ...
bearcat
Respected Member
Posts: 282
Joined: Fri Mar 19, 2010 4:49 am

Post by bearcat »

You could just put the accumulator setup in a prior instruction, without needing to add operands to the MACCS. Seems more readable to me to separate things out.

For an IIR on the X200 in assembly, you want to use indexing to access the parameters and feedback. So you need a structure (array for multiple IIRs) to preload them and hold the feedback. The key with X200 is to order the parameters, in the structure, and MACCS so you can use a single LDD instruction between your MACCS to load 2 parameters at once or save feedback. Recommend using first order correction by saving the lower accumulator and reloading, this has a significant improvement.
User avatar
CousinItt
Respected Member
Posts: 256
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

I think I'll write the whole function in assembler to avoid any surprises. XMOS seems to excel in providing incomplete information.

Thanks for the tips bearcat. Just to make sure I understand correctly, are you recommending saving the lower half of the accumulator with the upper half between runs? Otherwise I'm not sure what you mean by reloading.
bearcat
Respected Member
Posts: 282
Joined: Fri Mar 19, 2010 4:49 am

Post by bearcat »

Yes save the lower 32 bits of the accumulator, then reload next pass into lower 32 and a zero in upper first thing. You can research using the "remainder" in a IIR.

Also need to probably use A1/2 and/or B1/2, then use a double MACCS as most efficient.

You will also need a ".Align 4" in the proper spot with counting operand lengths. It would best to use a .S assembler file, but I have also done a full biquad using inline assembly which is much less work.

Edit: added A1/2
User avatar
CousinItt
Respected Member
Posts: 256
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

Thanks.

I've just been playing with an assembler version and there seems to be more to the way the compiler treats .S files. I thought that .S would just allow use of the preprocessor, but it also appears to allow the compiler to 'optimise' the assembler code. Maybe I can stop that by using specific ASM_FLAGS but for now it seems simpler to use .s.
User avatar
akp
XCore Expert
Posts: 544
Joined: Thu Nov 26, 2015 11:47 pm

Post by akp »

Just a note with .S files. To my knowledge these will always be called as actual functions so you'll have function overhead. If you use inline assembler then you can inline the function call and that can be faster in some cases.

Are you writing your assembly in dual issue? You probably know that it's possible to do that with inline assembly but you have to ensure the containing xc or c function is compiled with dual issue enabled.
User avatar
CousinItt
Respected Member
Posts: 256
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

Thanks akp, I hadn't considered inlining. I will give dual issue a try once I'm comfortable a single issue version is working fine. It's fairly noddy code but I'm not that comfortable with xcore assembler yet.

On the topic of dual issue, I found this link, which is quite helpful. I hadn't realised that there were two versions of entsp, and couldn't work out why I was getting an exception.
User avatar
CousinItt
Respected Member
Posts: 256
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

For future reference, the entsp problem is covered here: https://xcore.com/viewtopic.php?f=47&t=5060
User avatar
akp
XCore Expert
Posts: 544
Joined: Thu Nov 26, 2015 11:47 pm

Post by akp »

Sorry, I should have noted you need to use the 32 bit version of ENTSP. When I write assembly I typically use dual issue so it's always DUALENTSP. I would suggest you just write dual issue since if you write like it's single issue, it will automatically insert NOPs to generate 32 bit instructions and the timing will be exactly the same (except it will take twice as much program memory of course). You don't have to be careful about the memory or resource lane then (believe me, the XS2 ISA will be your friend). But then it's easy to start changing your instructions to use the dual issue braces {} for instructions where it makes sense