What is the overhead of a for loop?

Technical questions regarding the XTC tools and programming with XMOS.
Post Reply
jarnot
Member++
Posts: 26
Joined: Thu Apr 15, 2010 4:52 pm
Contact:

What is the overhead of a for loop?

Post by jarnot »

I am using an XC-2 for multiple functions, one of which is to generate a simulated a 262,144 data block transmitted serially two bits at a time with a 340 ns period clock. The first pair of data bits are accompanied by a strobe (sync) signal. As the code evolved, it ended up with 3 nested for loops, and the time taken to drop through all 3 for statements appears to be longer than 170 ns, meaning that sufficient time has passed that the first output statement in the inner loop does not take place 170 ns after the previous one, and a delay corresponding to rollover of the 16 bit I/O timer is introduced. The code fragment below works as expected if the last 'time += TIC' is included, but there is an ~655 microsecond additional delay if it is deleted.

Code: Select all

#define BIT_TIME 34        // 340 ns versus ASIC nominal 341.3 ns
#define TIC BIT_TIME / 2

  unsigned time, sync, clk = 4, dataword, k, cntr;
  unsigned count64[2];  // contains 64-bit count
  int inc = 6;          // counter increment
  int i, j, n;
  timer t;
  
  t :> time;
  time += 100;

  while(1) {
    sync = 8;
    count64[0] = 0x0;          // chan 0 init
    count64[1] = 0x00030000;   // counts
    cntr = 0x00000040U;   // initial counter value

   for (i=0; i<4096; i++) {         // output dummy data frame
      for (n=0; n<2; n++) {
        k = count64[n];
        for (j=0; j<16; j++) {         // output 32 bit data word
          dataword = sync + clk + (k & 3);
          DD @ time <: dataword;
          dataword = sync + (k & 3);
          time += TIC;                   // TIC = 170, timer clock is the default 100 MHz
          DD @ time <: dataword;
          k = k >> 2;
          sync = 0;
          time += TIC;
        }
     }
      time += TIC;  //adding this additional 170 ns delay avoids the timer wrap-around
  }
}
My question is simply whether or not it is reasonable to expect it to take over 170 ns to drop through the 3 for loop entries and make a pair of assignments before reaching/executing the first:

Code: Select all

DD @ time <: dataword;
statement in the innermost for loop? I can see ways of speeding things up, but it would be nice if someone with more XMOS experience than me could comment or advise on this.

Regards,

Robert Jarnot


User avatar
Woody
XCore Addict
Posts: 165
Joined: Wed Feb 10, 2010 2:32 pm

Post by Woody »

jarnot wrote: My question is simply whether or not it is reasonable to expect it to take over 170 ns to drop through the 3 for loop entries and make a pair of assignments before reaching/executing the first:

Code: Select all

DD @ time <: dataword;
statement in the innermost for loop?
The simple answer is yes. If you have 8 threads running on a 400MHz device, each thread executes an instruction every 20ns. 9 instructions will take 180ns best case to execute, but some extra cycles may be required to fetch instructions. I would not be surprised if more than 9 instructions were used in the path you describe.

If you want to have a quick look at the actual assembler created by the tool use xobjdump e.g.

Code: Select all

xobjdump -d -o a.lst a.xe
In a.lst the assembler created for the

Code: Select all

DD @ time <: dataword;
statement you should see a 'setps' and an 'out' instruction to perform this operation.

The XTA tool is designed for you to perform exactly this sort of computation. You simply point out the code line you want to start timing from and the code line you want to stop timing on and it tells you how long the code takes to run between the two points!

You should be compiling in 'release' rather than 'debug' mode so that the optimizations are enabled, and you should ensure that -o3 (maximum) optimizations are enabled.
jarnot
Member++
Posts: 26
Joined: Thu Apr 15, 2010 4:52 pm
Contact:

Post by jarnot »

Thank you for your very useful reply. While I do not have processes running on all 8 threads, I had forgotten that more than one thread is unblocked. I will follow all of your suggestions, including trying out XTA (which I have ignored until now), and I suspect that I will learn a lot.
Post Reply