Fastest possible bit mapping/manipulation/compression

Technical questions regarding the XTC tools and programming with XMOS.
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

Code: Select all

u32 get4bits(u8 *a, u8 thres)
{
  u32 x;
  // get four values
/*1*/  asm("ldw %0,0[%1]" : "=r"(x) : "r"(a));

  // put guard bits in
/*2*/  x |= 0x80808080;

  u32 thres4 = thres+1;
  thres |= thres << 8;
  thres <<= thres << 16;

  // subtract; the guard bits will stay 1 iff the value was >thres
/*3*/  x -= thres4;

  // now move the bits into place
/*4*/  x &= 0x80808080;
/*5*/  x |= x >> 7;
/*6*/  x |= x >> 14;

/*7*/  return x & 15;
}
(You can do 5 and 6 in one cycle using LMUL).

If you can change the input order, you can shave of some more cycles, indeed.


mculibrk
Active Member
Posts: 38
Joined: Tue Jul 13, 2010 2:57 pm

Post by mculibrk »

You're right, sorry.
I just "translated" your C-code to "number of asm instructions" but did not bother to exclude the "initialization" out of the "loop". (loading 0x80808080 and preparing the thres4)

I'm/was a little sceptic about "smart register usage" of GCC... if I understood correctly, the "register, volatile" modifiers are ignored in XC, right?
...and by some disassembly I saw the compiler tends to re-load immediates/variables instead of holding them in registers. But maybe (very likely) I'm totally wrong.

regards,
mculibrk
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

Neither the XC compiler not any other XMOS compiler is GCC.

The compiler will quite likely not optimise this fully, not without some hand-holding
anyway. You should write sensitive code like this in straight assembler anyway,
firstly to get the last drop of performance out of it (a single cycle matters here!),
and to be assured that with e.g. newer compiler versions or slightly different
compiler options, or when you rearrange some other code, it will still result in the
same fast code.

I don't know if XC supports register and/or volatile; it would be quite surprising if
it just ignored either though.