Digital Down Converter

Technical discussions around xCORE processors (e.g. xcore-200 & xcore.ai).
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

ahenshaw wrote:How wide do you think the transition band with double resampling would be? Every bit would help.
What about 188 kHz ?
multistage.png
(68.31 KiB) Not downloaded yet
multistage.png
(68.31 KiB) Not downloaded yet
I can give you more, since only 3 threads will run at the full rate (5 MHz or higher)
The other filter will just run in 1 MHz or higher, and can thus have a much higher order.


Probably not the most confused programmer anymore on the XCORE forum.
User avatar
ahenshaw
Experienced Member
Posts: 96
Joined: Mon Mar 22, 2010 8:55 pm

Post by ahenshaw »

188 kHz would be excellent! Many thanks!
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

Just to check,
EDIT Assuming it is video information ... which it is not
Your moving picture information, is it very phase sensitive?
Is IIR filters ok, or do you actual need linear phase FIR filters, at least for the second stage ?
What maximum phase deviation can you tolerate relative to 0 Hz in the transition band.
(Assuming absolute phase or small group delays doesn't matter.)

A 1:5 downsampling minimum-phase filter will have ~ 5 samples delay at DC but for an example already at 150 kHz, the groupdelay in the stage II filter will be around around 10, meaning that information above 150 kHz will lag more than 5 samples compared to DC. Assuming you have a "chess pattern" That should have a mean-value of gray (DC-value) and thereafter a deviation of dark to bright every subsampled pixels, using some row/column based dataformat from a monochrome CCD, would corrupt the rows with lag heavily or :?:

I believe I easily can create visual simulated examples for you regarding the artifacts, just knowing a little of what type of picture data we are taking about.
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
ahenshaw
Experienced Member
Posts: 96
Joined: Mon Mar 22, 2010 8:55 pm

Post by ahenshaw »

We seem to have had a miscommunication along the way. I hope it doesn't change the validity of the approach. The domain of interest is a band-limited radio frequency signal sampled at 5-10 MHz (or higher if possible). Retaining phase integrity will not be a problem for this application.


/Lilltroll
Perfect, it just hooked on the 5 MHz, that it "could be" a baseband of an analogue video signal - but great - no phase problems. Just to ensure that it will not be a problem later.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

ASM Optimized for a G4 or a L2 @ 500 MHz, what do you prefer ?
Actually I think I can give you more throughput on a L2 device.

What is my objective, to lock 200 kHz and run as fast as possible, or run as fast a possible on a factor 1:25 ??
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
ahenshaw
Experienced Member
Posts: 96
Joined: Mon Mar 22, 2010 8:55 pm

Post by ahenshaw »

Wow, ASM-optimized! Locked to 200 kHz and fast as possible would be best. Interesting that an L2 device can be faster, but I'd need it on a G4.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

ahenshaw wrote:Wow, ASM-optimized! Locked to 200 kHz and fast as possible would be best. Interesting that an L2 device can be faster, but I'd need it on a G4.
Hmm, me guessing again. Not used to RF signal that should be multiplied by a carrier.
EDIT cannot be used before the DDS.
I guess you will have at least a buffer before the ADC, meaning that you can create a 2:nd / 3:rd or 4:rd order passive or active analogue filter with a 200 kHz passband, and 90 dB rejection at n MHz.

A 4:th order filter would easily reject > 90 dB at 5 MHz, (not considering RF rejection, but you can solve that with Ferite Beds and so on. )


Digital filtering is "cheap in $ " but some capacitor is even cheaper. Do you not have any constrains, can I use all 1600 MIPS on the G4 ??, at least I can give you some threads for handling the ADC communication. Give me some limits to work with. Will you need a DDS at x MHz as well, channel adaptation if it is a "digital" communication signal - maybe LMS equalizers or something!?

http://en.wikipedia.org/wiki/File:DDC.svg
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

This pre alfa release uses 16 instruction per sample + the startup overhead.

Code: Select all

#define NWORDS 10
 //   .cc_top biquadAsm, biquadAsm.func
    
    .globl fastbiquadAsm
    .globl fastbiquadAsm.nstackwords
    .linkset fastbiquadAsm.nstackwords,NWORDS

fastbiquadAsm:
    entsp NWORDS
    stw   r1, sp[0]
    stw   r4, sp[1]
    stw   r5, sp[2]
    stw   r6, sp[3]
    stw   r7, sp[4]
    stw   r8, sp[5]
    stw   r9, sp[6]
    stw   r10, sp[7]
	add r10,r0,0

    // load coefs from struct
#define b0 r0
    ldw b0,r2[0]
#define b1 r1
    ldw b1,r2[1]
#define a1 r4
    ldw a1,r2[3]
#define a2 r5
    ldw a2,r2[4]
#define b2 r2
    ldw b2,r2[2]

	ldc   r6, 0
    ldc   r7, 0
    ldc   r8, 0
	ldc   r9, 0

.align 4
LOOP:

//; R6: X[n-2]  R7: X[n-1]  r8: Y[N-2],  R9: Y[N-1],

    ldc   r3,  0
	ldc   r11, 0
    maccs  r3, r11, a2, r8  //out+=-½A2*Y2  scale up
    maccs  r3, r11, b2, r6  //out+=½ B2*X2  frees r6, scale down
    maccs  r3, r11, b1, r7  //out+=½ B1*X1  scale down
    shl	   r6,r9,1		   // Y1*2  reallocates r6
    maccs  r3, r11, a1, r6  //out+=½(-½A1* 2*Y1) = -½ A1*Y1  scale up
    in     r6, res[r10]
    ashr    r6, r6, 7		//X0=x*2^-7
    maccs  r3, r11, b0, r6  //out+=½ B0*X0
    shr  r11, r11, 24
    shl  r3, r3, 8
    or   r3, r11, r3     //frees r11
    ldw  r11,sp[0]
    out  res[r11],r3   //out = 2^8* ½(X0B0+X1B1+X2B2-A1Y1-A2Y2) | X0=x*2-^7

//; R6: X[n-1]  R7: X[n-2]  r8: Y[N-1],  R9: Y[N-2],
.align 4
    ldc   r3,  0
	ldc   r11, 0
    maccs  r3, r11, a2, r9 //out+=-A2*Y2   scale up
    maccs  r3, r11, b2, r7 //out+=½ B2*X2   frees r7, scale down
    maccs  r3, r11, b1, r6 //out+=½ B1*X1  scale down
    shl    r7, r8,1		  // Y1*2  reallocates r7
	maccs  r3, r11, a1, r7 //out+=½ (-½A1* 2*Y1) = -½ A1*Y1  scale up
	in     r7, res[r10]
	ashr    r7, r7, 7		//X0=x*2^-7
    maccs  r3, r11, b0, r7 //out+=½ B0*X0
    shr  r11, r11, 24
    shl  r3, r3, 8
    or   r3, r11, r3     //frees r11
    ldw  r11,sp[0]
    out  res[r11],r3   //out = 2^8* ½(X0B0+X1B1+X2B2-½A1Y1-A2Y2) | X0=x*2^-7

//; R6: X[n]  R7: X[n-1]  r8: Y[N],  R9: Y[N-1],


    bu    LOOP


allDone:                          // Now just restore all registers.
        
    ldw   r4, sp[1]
    ldw   r5, sp[2]
    ldw   r6, sp[3]
    ldw   r7, sp[4]
    ldw   r8, sp[5]
    ldw   r9, sp[6]
    ldw   r10, sp[7]
    retsp NWORDS

//    .cc_bottom fastbiquadAsm.func
Or timing 1000 samples created in a datasource thread uses:
16019 samples = 6.25 M samples / s @ 100 MHz
( No FNOPS created in LOOP, but it might be unnecessary many shift inside the loop)
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
ahenshaw
Experienced Member
Posts: 96
Joined: Mon Mar 22, 2010 8:55 pm

Post by ahenshaw »

I think you understand what needs I need, but just to be clear, I'll restate it in my terms. This is part of a Software-Defined Radio. There is another SDR XMOS project being worked on, but it delegates the DDC to a TI chip. I'd like to use the XMOS processor to replace the TI chip, with the understanding that we won't be able to run at the higher data rates (and bandwidth) that the TI DDC ASIC can handle.

The RF signal will go through a low-pass filter and a variable-gain amplifier before hitting the ADC (12 or 14 bits per sample). The low-band-filter-limits and the ADC-sample-rate will be based upon the ability of the XMOS chip to handle. The XMOS device will need to transform the single data stream into two data streams (I&Q) and then low-pass each stream and then decimate to produce I&Q data at 150-200 Ksamples/s.

I'm really hoping to keep at least one and preferably two cores in reserve for additional narrowband processing and upstream communication.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

This code shows a step-response output that can be believed to be more or less correct. Uses 13 instructions per sample.

Code: Select all

#define NWORDS 11
 //   .cc_top biquadAsm, biquadAsm.func
    
    .globl fastbiquadAsm
    .globl fastbiquadAsm.nstackwords
    .linkset fastbiquadAsm.nstackwords,NWORDS

fastbiquadAsm:
    entsp NWORDS
#define cin sp[0]
    stw   r0, cin
#define cout sp[1]
    stw   r1, cout
	set dp,r2
    stw   r4, sp[3]
    stw   r5, sp[4]
    stw   r6, sp[5]
    stw   r7, sp[6]
    stw   r8, sp[7]
    stw   r9, sp[8]
    stw   r10, sp[9]

    // load coefs from struct
#define b0 r0
    ldw b0,dp[0]
#define b1 r1
    ldw b1,dp[1]
#define b2 r2
    ldw b2,dp[2]
#define a1 r4
    ldw a1,dp[3]
#define a2 r5
    ldw a2,dp[4]

	ldc   r6, 0
    ldc   r7, 0
    ldc   r8, 0
	ldc   r9, 0
	ldw   r10,cin

.align 4
LOOP:

//; R6: X[n-2]  R7: X[n-1]  r8: Y[N-2],  R9: Y[N-1],

    ldc   r3,  0
	ldw   r11, dp[5]
    maccs  r3, r11, b0, r6  //out+=½ B2*X2  frees r6
    maccs  r3, r11, b1, r7  //out+=½ B1*X1
    in     r6, res[r10]     //allocates r6
    maccs  r3, r11, b0, r6  //out+=½ B0*X0
    shl	   r2,r9,1		   // Y1*=2
    maccs  r3, r11, a1, r2  //out+=½ (-½A1* 2*Y1) = -½ A1*Y1  scale up
    maccs  r3, r11, a2, r8  //out+=-A2*Y2  scale up frees r8
    shl  r8, r3, 1
    //ldc r3,31
    //shr r11,r11,r3
    //or r8,r8,r11
    ldw  r11,cout
    out  res[r11],r8   //out = 2* ½(X0B0+X1B1+X2B2-A1Y1-A2Y2) 

//; R6: X[n-1]  R7: X[n-2]  r8: Y[N-1],  R9: Y[N-2],
.align 4
    ldc   r3,  0
	ldw   r11, dp[5]
    maccs  r3, r11, b0, r7 //out+=½ B2*X2   frees r7, scale down
    maccs  r3, r11, b1, r6 //out+=½ B1*X1  scale down
    in     r7, res[r10]
    maccs  r3, r11, b0, r7 //out+=½ B0*X0
	shl    r2, r8,1		  // Y1*2
	maccs  r3, r11, a1, r2 //out+=½ (-½A1* 2*Y1) = -½ A1*Y1  scale up
	maccs  r3, r11, a2, r9 //out+=-A2*Y2   scale up, frees r9
    shl  r9, r3, 1
    //ldc r3,31
    //shr r11,r11,r3
    //or r9,r9,r11
    ldw  r11,cout
    out  res[r11],r9   //out = 2* ½(X0B0+X1B1+X2B2-½A1Y1-A2Y2) 
	//ldw    r10,cin

//; R6: X[n]  R7: X[n-1]  r8: Y[N],  R9: Y[N-1],


    bu    LOOP


allDone:                          // Now just restore all registers.
        
    ldw   r4, sp[3]
    ldw   r5, sp[4]
    ldw   r6, sp[5]
    ldw   r7, sp[6]
    ldw   r8, sp[7]
    ldw   r9, sp[8]
    ldw   r10, sp[9]
    retsp NWORDS

//    .cc_bottom fastbiquadAsm.func

Code: Select all

#include <platform.h>
#include <print.h>
//#include "filtercoef.h"

typedef struct {
	int B0;
	int B1;
	int B2;
	int A1;
	int A2;
	unsigned K;
}coef;

extern fastbiquadAsm(streaming chanend cin,streaming chanend cout,coef f);

void feed(streaming chanend cout, streaming chanend cin) {
#define L 3000
	int ans,time, run = 1, counter = 0;
	timer t;
	t:>time;
	cout<: 1<<24;
	cout<: 1<<24;
	cout<: 1<<24;
	cout<: 1<<24;
	cout<: 1<<24;
	cout<: 1<<24;
	time+=10000;
	t when timerafter(time):>time;
	time+=100000000;
	while(run) {
		select{
	default:
		cout<: 1<<24;
		counter++;
		cin:>ans;
		break;
		case t when timerafter(time):>void:
		run=0;
		cin:>ans;
		cin:>ans;
		cin:>ans;
		cin:>ans;
		cin:>ans;
		cin:>ans;
		break;
		}
	}

	printint(counter);
	printstr(", ");
	printintln(ans);

}

int main() {
	streaming chan c0, c1, c2, c3;

	par {
		on stdcore[0]:
		{
			coef f[1];
			//Butterworth test filter wc=0.01
			f[0].A1 = 2099786147;
			f[0].A2 = -2054161904;
			f[0].B0 = 518315;
			f[0].B1 = 1036629;
			f[0].B2 = 518315;
			f[0].K = 1 << 31;
			par {
				fastbiquadAsm(c0, c1, f[0]);
				fastbiquadAsm(c1, c2, f[0]);
				fastbiquadAsm(c2, c3, f[0]);
				feed(c0, c3);
			}
		}
	}
	return 0;
}
Gives on a XC-1A,
console
7692310, 16774402

This means that it filtered 7.69 Msamples during one second (6:th order) and that there is no unstable DC drift.
(The output of a stepresonse of 1<<24 after 7692310 samples)
The passband gain doesn't become exactly 0.0000 dB in the filter due to fix number.
Probably not the most confused programmer anymore on the XCORE forum.
Post Reply