Finding number of zeros from MSB and down to first non-zero

Technical questions regarding the XTC tools and programming with XMOS.
User avatar
aclassifier
Respected Member
Posts: 487
Joined: Wed Apr 25, 2012 8:52 pm

Finding number of zeros from MSB and down to first non-zero

Post by aclassifier »

I need to find number of zeros from MSB and down to first non-zero bit.

Ideally I'd like to do it without looping. In the code below I certainly loop.

However, is there an assembly instruction that could do this faster? Since arithmetic shifts are done in a barrel shifter (I think), in one cycle, I thought - maybe.

// #                                   sign bit is BIT31, not interesting
// #1111111 11111111 11111111 11111111
//                                       0 returned
// #0000000 11111111 11111111 11111111
//  1234567                              7 returned
// #0000000 00000000 00000000 00000000
//  1234567 89012345 67890123 45678901  31 returned

unsigned Get_Num_High_ZeroBits (const signed unsigned_31bits) { // Only [BIT30..BIT0] are valid
    unsigned ibit;
    xassert (unsigned_31bits >= 0);
    if (unsigned_31bits == 0) { // separate test to avoid looping forever below
        ibit = 0; // To return 31 as max
    } else {
        ibit = 30; // To return 0 as min after the final inc
        while ((unsigned_31bits bitand (1<<ibit)) == 0) { // 1 << 30 sets BIT30
            ibit--;
        } 
        ibit++; // final inc to avoid doing an (ibit-1) inside the loop
    }
    return (31-ibit);
}

void Test_Get_Num_High_ZeroBits (void) {
    /* Test passed:
    zb 0
    zb 7
    zb 31 
    */ 
    signed val;
    val = 0x7FFFFFFF; // Sign bit only is zero, so it's still positive
    debug_print_special ("zb %u\n", Get_Num_High_ZeroBits (val)); // 0
    val = 0x00FFFFFF;
    debug_print_special ("zb %u\n", Get_Num_High_ZeroBits (val)); // 7
    val = 0x00000000;
    debug_print_special ("zb %u\n", Get_Num_High_ZeroBits (val)); // 31
}


--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
Ross
XCore Expert
Posts: 968
Joined: Thu Dec 10, 2009 9:20 pm
Location: Bristol, UK

Post by Ross »

Count Leading Zeros - CLZ instruction - accessed via clz() from xclib.h.

XS3 (xcore.ai) also has Count Leading Sign (CLS). This isnt actually in xclib.h yet for some reason. Here’s a function for cls() that uses CLS instruction on xs3 and CLZ twice on xs2 to provide the same result.

Code: Select all

#include <xs1.h>
#include <xclib.h>

static inline int cls(int idata)
{
    int x;
#if __XS3A__
    asm volatile("cls %0, %1" : "=r"(x)  : "r"(idata));
#else
    x = (clz(idata) + clz(~idata));
#endif
    return x;
}
Note, neither of these instructions are in XS1.
User avatar
aclassifier
Respected Member
Posts: 487
Joined: Wed Apr 25, 2012 8:52 pm

Post by aclassifier »

Ross, THANK YOU! I will certainly test this tomorrow! Thank you again! Øyvind
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
aclassifier
Respected Member
Posts: 487
Joined: Wed Apr 25, 2012 8:52 pm

Post by aclassifier »

It worked of course! For my function I had to subtract by 1, since I don't want the sign bit to count.

In the "Assembly Programming Manual" it says that "clz | d,s | Count leading zeros" and nothing more, so doing this twice I do not understand.

// clz(00000000 11111111 11111111 11111111) is 8 +
// clz(11111111 00000000 00000000 00000000) is 0 =
// some number + zero = some number always?

So I removed the second clz(compl) and it seems to work as explected. What's the catch?
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
Ross
XCore Expert
Posts: 968
Joined: Thu Dec 10, 2009 9:20 pm
Location: Bristol, UK

Post by Ross »

x = (clz(idata) + clz(~idata)); results in the same result as Count Leading Sign - not something you asked for but provided for completeness.
User avatar
aclassifier
Respected Member
Posts: 487
Joined: Wed Apr 25, 2012 8:52 pm

Post by aclassifier »

Ok, I see in https://www.xmos.com/download/The-XMOS- ... ecture.pdf I read (with some text of mine added, related to the code example above), from pp 17, 90 and 92:

bwp = (is the number of bits in a word = 32)
s = (source = idata)
d = (destination = num_zero_bits)

CLS count leading signbits
Counts the number of leading sign bits in s If the s is zero, then bpw is produced.
The instruction can never produce 0 as there is always at least one sign-bit.
This instruction can be used to efficiently compute the headroom of signed integers.

count leading signbits - assuming that s is negative so that we see the sign (=1) bits
CLS | 32,                                 if s=0 (all 32 bits are zero)
    | 32,                                 if s=-1 (all bits seem to be sign bits, error return I assume)
    | d:lowestd:s[bit bpw−1−d]!=s[bpw−1], (??)
                                          otherwise leading signbits (number of ones down to first 0)

CLZ Count leading zeros
Counts the number of leading zero bits in its operand. If the operand is zero, then bpw is produced. If the operand starts with a ’1’ bit (ie, a negative signed integer, or a large un- signed integer), then 0 is produced. This instruction can be used to efficiently normalise integers.

count leading - assuming that s is positive so that we "see" the zeros
CLZ | 32,                                    if s=0 (all 32 bits are zero)
    | d : lowest d : s[bit bpw − 1 − d] = 1, (??)
                                             otherwise zeros (number of zeros down to first 1)

I assume that this has to do with the fact that if a negative int32_t with one sign bit (1), if arithmetic shifted >> down 8 positions, would have nine sign bits, where it could also be treated as an int24_t with the sign bit in its 24 bits leftmost position? This also goes for the number of sign bits of a positive, with one vs. nine sign bits (0) - but these sign bits aren't "visible" out of context.

So aren't these instructions "othognal" meaning one cannot be derived from the other? I mean number of zeros vs. number of ones!? Like why CLS(idata) is the same as CLZ(idata)+CLZ(compl idata). Of course the compl makes zeros visible!?

But I must admit, I would certainly like to have this fed to me with a teaspoon.. There must be quite some facts that I have twisted my brain wrongly with here!
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
Ross
XCore Expert
Posts: 968
Joined: Thu Dec 10, 2009 9:20 pm
Location: Bristol, UK

Post by Ross »

Sign bits are not always 1, they can be 0 - indicating a positive number. Does that help? I.e. CLS will return the length of the first run of 0's or 1's from the MSB.

This is why cls(0) and cls(-1) are both 32.
User avatar
aclassifier
Respected Member
Posts: 487
Joined: Wed Apr 25, 2012 8:52 pm

Post by aclassifier »

As also indicated in my rather too wordly text (sorry) - yes I have known for some years that sign bits can be zero(s), sir :-)

But the second part certainly helped! That CLS counts the number of bits downward depending on the value of BIT31:
  1. If BIT31 is 1 it counts the number of 1's to the first 0
  2. If BIT31 is 0 it counts the number of 0's to the first 1
So point 1 is the same as CLZ(compl idata) and point 2 the same as CLZ(idata). So when you do the add of them, one of the parts will always be zero.

??
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
Ross
XCore Expert
Posts: 968
Joined: Thu Dec 10, 2009 9:20 pm
Location: Bristol, UK

Post by Ross »

aclassifier wrote: Wed Oct 25, 2023 3:18 pm yes I have known for some years that sign bits can be zero(s), sir :-)
Sorry! ;)
aclassifier wrote: Wed Oct 25, 2023 3:18 pm So point 1 is the same as CLZ(compl idata) and point 2 the same as CLZ(idata). So when you do the add of them, one of the parts will always be zero.

??
Yes, one side of this addition will always be 0.
User avatar
aclassifier
Respected Member
Posts: 487
Joined: Wed Apr 25, 2012 8:52 pm

Post by aclassifier »

Thank you Ross! I am so grateful for the fact that you did feed me with teaspoons, as I wanted! I do understand your small overfeeding!-)

But I haven't really done much assembly coding since well back in the last millennium (https://www.teigfam.net/oyvind/home/tec ... ault-0x05/) so this was really great!

And since I do this bit banging https://www.xcore.com/viewtopic.php?t=8595 it's so good to not waste cycles in a loop when not needed. I try to do some AGC here.
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/