Special floating points formats ???

Technical questions regarding the XTC tools and programming with XMOS.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

First an example of a butterworh LP filter at 100Hz @ 48kHz.

Code: Select all

[B,A]=butter(2,100/24000)

B =
  1.0e-004 *

   0.424433681401881   0.848867362803762   0.424433681401881

A =

   1.000000000000000  -1.981488509144573   0.981658282617134

By making this very "float look a like" decomposition

Code: Select all

// X_double = X_radix*2^(X_exp-31) where X_exp=ceil(log(abs(Xdouble))/log(2))
I should be able to write the decomposed coef. like this in XC:

Code: Select all

	int Aradix[]={-2127607086,2108095110};
	int Bradix[]={1493343257,1493343257,1493343257};
	short Aexp[]={0,1,0}
	short Bexp[]={-14,-13,-14}; 
making maximum use of resolution. (The original A0=1 is removed)

On nice thing is that the that the input signal (int32) X doesn't need any float conversion besides adding an Xexp=0;
No resolution at all are lost in the conversion from int32 to float48 of the input signal.

Correct me if I'm wrong!


Probably not the most confused programmer anymore on the XCORE forum.
JohnR
Experienced Member
Posts: 93
Joined: Fri Dec 11, 2009 1:39 pm

Post by JohnR »

Lilltroll: Yes I was intending to make available an XCORE project of the code once I have got the basic math operations working, starting with doubles.

Basically I am compiling sections of the SoftFloat C code, examining the emitted assembler and then seeing where I can optimise. For instance, as I mentioned before, XMOS has an instruction for counting leading zeros which should replace several lines of SoftFloat C code.

John.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

First simulation in MATLAB with a 32 bit radix (Sign+31bit fractional).
The filter is a butterworth-HP filter with a corner freq of 20 Hz @ 48000. Such type of filtercoeff creates a large roundoff error.
Error1.png
Error1.png (7.14 KiB) Viewed 5001 times
Error1.png
Error1.png (7.14 KiB) Viewed 5001 times
The function (also written by me) Normalisera, Add and Mult are supposed to simulate function that could be implementet in XMOS-ASM.

MATLAB "Main"

Code: Select all

global Frac_resolution;
Frac_resolution=31;

%Create test filter
[B,A]=butter(2,20/24000,'High');


%Normalize B,A
[B_r,B_e]=normalisera(B);
[A_r,A_e]=normalisera(A);

%Create testsignal
Fs=48000;
t=0:1/Fs:1;
X=chirp(t,5,1,Fs/2,'logarithmic',-90); 
Z_r=[0,0,0];
Z_e=[1,1,1]*2^-15;
Y=zeros(length(t),1);

%Run filter loop emulation of TDF-II filter
for i=1:length(t)
   [x_r,x_e]=normalisera(X(i));
   %Y=B1*X+Z1;
   [P_r,P_e]=mult(B_r(1),B_e(1),x_r,x_e);
   [y_r,y_e]=add(P_r,P_e,Z_r(1),Z_e(1));
   Y(i)=y_r*2^y_e;
   %Z1=B2*X-A2*Y+Z2;
   [P_r,P_e]=mult(B_r(2),B_e(2),x_r,x_e);
   [Z_r(1),Z_e(1)]=add(P_r,P_e,Z_r(2),Z_e(2));
   [P_r,P_e]=mult(-A_r(2),A_e(2),y_r,y_e);
   [Z_r(1),Z_e(1)]=add(P_r,P_e,Z_r(1),Z_e(1));
   %Z2=B3*X-A3*Y
   [P_r,P_e]=mult(B_r(3),B_e(3),x_r,x_e);
   [T_r,T_e]=mult(-A_r(3),A_e(3),y_r,y_e);
   [Z_r(2),Z_e(2)]=add(P_r,P_e,T_r,T_e);
end
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

Leon or someone else :?:

With the available instructions for XMOS how shoud i implement the softfloat ADD ?

My idéa with Mult_method is to use the MACS instructions (Sign+31 bit) * (Sign+31 bit) = (Sign + 62 bit) befor normalization.

But with Add i might end upp with (Sign+31 bit) + (Sign+31 bit) =(Sign+32 bit) ->overflow befor normalization.

I guess I could use reduce to (Sign + 30 bits) or use the Long Add Instruction, but I'm afraid that the Long Add will eat alot of instructions for me!?

If I start to use long add together with MAC/MACS I could use a 62 bit fractional for the internal filter stages. Furthermore some of the instructions only support unsigned ints.

Please Help me with some tips regarding this.
Probably not the most confused programmer anymore on the XCORE forum.
JohnR
Experienced Member
Posts: 93
Joined: Fri Dec 11, 2009 1:39 pm

Post by JohnR »

Add and subtract routines are always the worst of the basic floating point maths functions.
But with Add i might end upp with (Sign+31 bit) + (Sign+31 bit) =(Sign+32 bit) ->overflow befor normalization.
If you handle the sign separately during the ADD function, converting negative to unsigned values, then correct the sign on exit.

The slowest part of the add/sub functions is aligning the exponents.

but I'm afraid that the Long Add will eat alot of instructions for me!?
I think all instructions execute in a single cycle except for the divide and remainder functions.
See this useful thread on XMOSlinkers forum.
http://www.xmoslinkers.org/forum/viewto ... ?f=7&t=232

John.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

JohnR wrote:Add and subtract routines are always the worst of the basic floating point maths functions.
But with Add i might end upp with (Sign+31 bit) + (Sign+31 bit) =(Sign+32 bit) ->overflow befor normalization.
If you handle the sign separately during the ADD function, converting negative to unsigned values, then correct the sign on exit.

The slowest part of the add/sub functions is aligning the exponents.

but I'm afraid that the Long Add will eat alot of instructions for me!?
I think all instructions execute in a single cycle except for the divide and remainder functions.
See this useful thread on XMOSlinkers forum.
http://www.xmoslinkers.org/forum/viewto ... ?f=7&t=232

John.
I'm more thinking of up to the 6 registers that the instruction fetches data from. Moving all that data between each ALU-instruction takes alot of time.
Probably not the most confused programmer anymore on the XCORE forum.
JohnR
Experienced Member
Posts: 93
Joined: Fri Dec 11, 2009 1:39 pm

Post by JohnR »

I'm more thinking of up to the 6 registers that the instruction fetches data from. Moving all that data between each ALU-instruction takes alot of time.
Well, floating point calculations _are_ complicated!

One problem I have had with the XMOS compiler, is that it normally only uses the 4 scratchpad registers and r11, at least without optimisation turned on. Even then, I don't know whether the higher optimisation settings do more than reorder code. Also for some reason, the compiler will often save intermediate results contained in a register to memory even when the very next instruction uses the data in that register.

So far, once I have got the C code working properly, I look at the corresponding assembly and then see if more registers can be usefully employed. Then the entsp/retsp values given at function entry and exit are adjusted to set the stack pointer correctly to allow the pre-existing contents of the registers to be saved/recalled.

John.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

Well I get nowhere att all with the ASM.

From XC-main

Code: Select all

 streaming chan c_DAC_L,c_DAC_R,c_ADC_L,c_ADC_R;	
 par
	 {
		 bypass(c_ADC_L,c_DAC_L);
		 bypass(c_ADC_R,c_DAC_R);
		 I2S_slave(c_DAC_L,c_DAC_R,c_ADC_L,c_ADC_R);
	 }
First trying to write the entire bypass in ASM => I do not know about the other commands besides IN and OUT to get the correct behaiviour in ASM.

Second, trying to avoid the channel com. by doing this
XC

Code: Select all

void bypass(streaming chanend IN,streaming chanend OUT)
{
	int IN_data=0,OUT_data=0;
	while(1){
		OUT<:bitrev(OUT_data);
		OUT_data=CALC(IN_data);
		IN:>IN_d;
	}
(Data comming to channel end IN from I2Sslave is already BITREVed in the I2Sslave function)

As I understand it the compiler will put IN_data in r0, and expect OUT_d to be placed in r0 as well.

1) First, if I write a CALC.S, can I use all registers r0-r11 without destroying anything in the bypass function ?

2) How does the simplest working ASM program look like that just passes the data from IN_data -> OUT_data ?
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

:D I found Count Leading Zeros as a built in function in xclib.h

This code works togheter with signed integers => resulting in arithmetic shifts of the radix.
A number is represented as A_r * 2^(A_e)

Code: Select all

while(1){
	OUT<:bitrev(Y_r>>Y_e);	
	X_e=clz(X_r^0x7FFFFFFF); 	//Normalize X
	// Y=B0*X
	{Y_r,Y_e}=f_mult(B0,0,X_r<<X_e,X_e);  
	IN:>X_r;
	}

And the f_mult including normalization

Code: Select all

{ int , int } f_mult(int A_r,int A_e,int B_r,int B_e)
{
	int P_r,P_e;
	unsigned int l;
	{P_r,l}=macs(A_r,B_r,0,0); 
	P_r+=P_r; //Compensate for frac_31*frac_31 = frac_62
	P_e=clz(P_r^FRAC);
	return{P_r<<P_e,A_e+B_e+P_e};
}
Probably not the most confused programmer anymore on the XCORE forum.
JohnR
Experienced Member
Posts: 93
Joined: Fri Dec 11, 2009 1:39 pm

Post by JohnR »

lilltroll: For a confused programmer, you are certainly doing pretty well! I really should add that I am also an not very experienced XMOS ASM developer either- I am more familiar with the 68000 and 8051 processors.
First, if I write a CALC.S, can I use all registers r0-r11 without destroying anything in the bypass function ?

2) How does the simplest working ASM program look like that just passes the data from IN_data -> OUT_data ?
The XMOS document abi97.pdf says on page 3
Function calling uses the first four registers to pass parameters. Additional parameters
are passed on the stack.
Except where otherwise stated, data types with size greater than int and all structures
are passed by passing a pointer. The callee must make a copy of the structure if it
needs to be modified. Scalar types smaller than 32 bits are passed as zero or sign
extended 32-bit values.
In an earlier version of a floating point divide function I wanted extra registers to hold temporaries so the preexisting values had to be first stored to memory as below:
entsp 0x8
stw r4, sp[0x1]
stw r5, sp[0x2] # r5 holds result exponent
stw r6, sp[0x3] # r6 holds result quotient
stw r7, sp[0x4] # r7 holds main divide loop counter
ldc r6, 0x0 # clear quotient
ldc r7, 0x0 # clear loop counter
add r3, r0, 0x0 # copy return address to r3
add r4, r2, 0x0 # copy op1 address to r4
ldaw r0, sp[0xa]
ldc r2, 0x8
bl __crt_memcpy #copy r1 (op0) to memory at sp[0xa]
At the end of the function, the data are copied into the return address and the extra registers that have been used are restored
ldaw r1, sp[0x18]
add r0, r3,0x0
ldc r2, 0x8
bl __crt_memcpy

ldw r7, sp[0x4]
ldw r6, sp[0x3]
ldw r5, sp[0x2]
ldw r4, sp[0x1]
retsp 0x8
I would be happier if some one else would jump in here and I won't have to continue exposing my ignorance.

John
Post Reply