Trimming Down SPI

Technical questions regarding the XTC tools and programming with XMOS.
User avatar
rp181
Respected Member
Posts: 395
Joined: Tue May 18, 2010 12:25 am

Trimming Down SPI

Post by rp181 »

I have this code:

Code: Select all


			//Read Byte 1
			spi_ss <: 0;
			clearbuf(miso)
			; sclk <: 0xAA;
			sclk <: 0xAA;
			sync(sclk);
			miso :> inByteTmp;
			data1 = ((bitrev(inByteTmp) >> 16));

			//Read Byte 2
			clearbuf(miso)
			; sclk <: 0xAA;
			sclk <: 0xAA;
			sync(sclk);
			miso :> inByteTmp;
			data1 += (bitrev(inByteTmp) >> 24);
			spi_ss <: 1;

			//Read Byte 3
			spi_ss <: 0;
			clearbuf(miso)
			; sclk <: 0xAA;
			sclk <: 0xAA;
			sync(sclk);
			miso :> inByteTmp;
			data2 = ((bitrev(inByteTmp) >> 16));

			//Read Byte 4
			clearbuf(miso)
			; sclk <: 0xAA;
			sclk <: 0xAA;
			sync(sclk);
			miso :> inByteTmp;
			data2 += (bitrev(inByteTmp) >> 24);
			spi_ss <: 1;
This code reads 32 bits from an ADC, 2 16 bit values. This section runs in a loop, and needs to run 400000 times in a second (400 kHz), i.e., ever instruction counts. While I am reaching this, I want to make it go faster, as I still have more code to put (actually using the value). I have all of the optimization flags on, what else could I do? I even took out function calls to the SPI methods, as these proved to be a significant slow down.

The ports are configured with clock blocks:

Code: Select all

configure_clock_rate(blk1, 100, spi_clock_div);
	configure_out_port(sclk, blk1, 0);
	configure_clock_src(blk2, sclk);
	configure_in_port(miso, blk2);
	clearbuf(sclk)
	; //anyone else having the auto-formatting do stupid things like this?
	start_clock(blk1)
	;
	start_clock(blk2)
;	sclk <: 0xFF;
I have never done low level programming before, but would replacing this with inline assembly offer any speed advantage?


kster59
XCore Addict
Posts: 162
Joined: Thu Dec 31, 2009 8:51 am

Post by kster59 »

sclk <: 0xaa;
sclk <: 0xaa;
pdata :> data;
sclk <: 0xaa;
sclk <: 0xaa;
pdata :> data2;

can be replaced with:
sclk <: 0xaaaaaaaa
pdata :> data;

with a 32bit buffered port for much better performance (probably at least double if not 4x).

I was able to get 12.5mhz reliably using spi slave and 25mhz if I used 32bit buffered port. Probably I could get 50mhz if it were spi master mode with 32bits.
User avatar
rp181
Respected Member
Posts: 395
Joined: Tue May 18, 2010 12:25 am

Post by rp181 »

I will try that. Is sync() not needed? I tried commenting it out and it seemed to work fine, but is this safe?
EDIT: That read 32 bits right?
EDIT EDIT: Ok, so I tried it, and like before ( i tried this way back, as part of trying to get the ADC to work), it just hangs forever. SCLK and MISO are both 32 bit buffered ports. I did:

Code: Select all

spi_ss <: 0;
clearbuf(miso);
sclk <: 0xAAAAAAAA;
sync(sclk);
miso :> data; //data is 32 bit integer
spi_ss <: 1;
kster59
XCore Addict
Posts: 162
Joined: Thu Dec 31, 2009 8:51 am

Post by kster59 »

Sorry meant:
a <: 0xaaaaaaaa;
a <: 0xaaaaaaaa;
pdata :> data32;

since that generates 16 clock cycles each time. Run it in the simulator or with your logic analyzer to understand what each line means. You can hook up the input/output in the simulator and run both a master and slave.
User avatar
rp181
Respected Member
Posts: 395
Joined: Tue May 18, 2010 12:25 am

Post by rp181 »

Ah, i keep getting # clock edges and cycles mixed up.

I switched it, but it is actually going slower. In total, I need to read 32 bits, 2 16 bit frames. In between, however, I need to pulse chip select. As a result, I had to do 2 32bit reads, as buffered 16 bit ports are unsupported. Is it possible to basically pulse chip select automatically every n clock cycles? I am going to look more into clock blocks, see if I can rig something up.

EDIT: Got it to work! I made the clock 8 bit buffered, and in between, pulsed the CS. I got about a 10 kHz improvement (150 to 160). Code:

Code: Select all

spi_ss <: 0;
			clearbuf(miso);
			sclk <: 0xAA;
			sclk <: 0xAA;
			sclk <: 0xAA;
			sclk <: 0xAA;
			sync(sclk);
			spi_ss <: 1;
			spi_ss <: 0;
			sclk <: 0xAA;
			sclk <: 0xAA;
			sclk <: 0xAA;
			sclk <: 0xAA;
			sync(sclk);
			miso :> data;
			spi_ss <: 1;
kster59
XCore Addict
Posts: 162
Joined: Thu Dec 31, 2009 8:51 am

Post by kster59 »

There's some function that allows you to read lower 16 bits of a 32 bit port in the docs.
Last edited by kster59 on Mon Aug 08, 2011 1:35 am, edited 1 time in total.
kster59
XCore Addict
Posts: 162
Joined: Thu Dec 31, 2009 8:51 am

Post by kster59 »

Better check your suggested code. I'm sure your other code won't work as intended and you will get garbage in your read.
User avatar
rp181
Respected Member
Posts: 395
Joined: Tue May 18, 2010 12:25 am

Post by rp181 »

Why do you say that? My readings show it is working perfectly fine.
Wafeforms:

I am reading from an ADC with a sequencer. I am using printf to show me what the address/data is. The address in properly incrementing from 0 to 7, and the shorted ADC pins are showing zero (others are floating)
kster59
XCore Addict
Posts: 162
Joined: Thu Dec 31, 2009 8:51 am

Post by kster59 »

While I could be wrong here is my understanding:

sclk clocks miso.

8bit buffered port has 4 byte fifo.

After you execute:
sclk <: 0xAA;
sclk <: 0xAA;
sclk <: 0xAA;
sclk <: 0xAA;
sync(sclk);
spi_ss <: 1;

You cannot expect spi_ss <: 1 to execute after all the proceeding 4 statements instead it will probably come after 2nd downfalling clock.

Also since the fifo is 4 deep the processor should block after 4x writes to sclk. I'm not sure how it's possible to trigger pmiso :> data with a 32bit clock expectation with only 16 clock cycles.

Finally why does your logic analyzer show varying width sclk readings? They should def all be the same.
ale500
Respected Member
Posts: 259
Joined: Thu Sep 16, 2010 9:15 am

Post by ale500 »

The clock appears to vary due to the low sampling rate (16Msps), most probably.

Cannot a strobed port be used to toggle SS ?