Thanks for the reply. I probably should have been a little more descriptive in my issue here.
Fixed point math shifts the decimal point over by a constant number of places, similar to an accountant's ledger. Instead of thinking of $137.48, you could represent this value as 13748 cents - Instead of 0.05, represent it as 5 percent, pi would be represented as 314, etc. This shift is accomplished by multiplying by a constant scalar value of 100.
Adding two of these number together is the same as dealing with decimal numbers. 0.05 + 1.32 = 1.37 and 5 + 132 = 137. (a*s) + (b*s) = (a+b)*s.
Multiplication becomes a problem because the scalar gets multiplied in twice - (a*s) * (b*s) = (a*b)*(s*s). In the example above, 13748 cents ($137.48) * 5 percent (0.05) becomes 68740 cents ($687.40) :shock: . The result needs to be divided by s again to become 687 cents ($6.87). the .4 of a cent can't be represented in the whole number and is lost.
Since computers deal with powers of 2 better than powers of 10, I'm doing a bitwise shift. In the 32 bit integer, I'm using a 16:16 ratio of whole number:fractional number (the scalar s is 65536). With this signed, it gives me a whole range of [-2^15 2^15-1] with an accuracy of 1/65536. The bits lay out like:
Code: Select all
Bit Number 32 31 30 ... 18 17 : 16 15 ... 2 1
Weight Sign 2^15 2^14 2^1 2^0 : 2^-1 2^-2 2^-15 2^-16
2^-1 is 1/2
2^-2 is 1/4
2^-3 is 1/8 and so on
I could get greater resolution by changing the ratio to something like 12:20, but it would reduce the range of the whole portion.
pi would be represented as 0x0003:243F (the : is just to visually isolate the whole:fractional part).
So, I have the multiply working with my 32 bit numbers, but I need to shift the entire result back 16 bits. If I multiply pi above by 1.0 with the macs function it would look like (H and L in the product):
0003:243F * 0001:0000 = 0000:0003 2434:0000
If I just shift each back to the right independently by 16, the 3 rolls off the H value and L becomes 0000:2434.
In pure brute force C, i would do something like:
Code: Select all
int carry;
for (i=0;i<=16;i++)
{
carry = 0;
if (H && 0x00000001)
carry = 1;
H >> 1;
L >> 1;
if (carry==1)
L = L || 0x80000000;
}
// check for overflow
if (H >0)
L= 0xFFFFFFFF;
My final result would be in L. Anything that gets rolled off of that is smaller than 1/65535 and lost. Anything left in H represents an overflow. In assembly, I should be able to take advantage of a carry flag that makes this a lot more streamlined.
Sorry for the long post - i hope it answers your questions on what I'm trying to do here.
Thanks again!
Mark