XUF224 channel comms

MyKeys · Post by **MyKeys** » Tue Feb 13, 2018 2:51 pm

Hi,

I'm having some interesting timing results using a streaming channel between 2 cores on different tiles through 2 xSwitches (XUF224 tiles 0 & 3).

I know going through multiple xSwitches increases the channel buffering and latency.
I can't understand why the sending thread would have massive pauses if the receiving thread is draining the channel as fast as possible see results below.
Also why does the first iteration round the sending loop some times take much longer?
I did add synchronization between the threads to make sure they started at the same time but this made no difference on the results.

I use the following code to generate the results with -o3 optimization level:

Code: Select all

#include <platform.h>
#include <print.h>

#define CONSECUTIVE_INTS    24

int main()
{
    streaming chan c;

    par
    {
        on tile[0]:
        par
        {
            // Send task
            {
                timer t;
                unsigned start_time, end_time;

                while(1)
                {
                    t :> start_time;

                    #pragma loop unroll
                    for (int i = 0; i < CONSECUTIVE_INTS; ++i)
                    {
                        c <: i;
                    }

                    t :> end_time;
                    printuintln(end_time - start_time);
                }
            }

        }

        on tile[3]:
        par
        {
            // receive task
            {
                unsigned temp;
                while(1)
                {
                    #pragma loop unroll
                    for (int i = 0; i < CONSECUTIVE_INTS; ++i)
                    {
                        c :> temp;
                    }
                }
            }
        }
    }
    return 0;
}

Results below show the printuintln from above which should indicate the loop duration in instructions.
In some cases 2 values are given in the subsequent iterations column due to some inconsistency?

CONSECUTIVE_INTS	First iteration	Subsequent iterations
6	6	6
7	7	7
8	8	8
9	17	9
10	26	13
11	36	23 or 24
12	46	33
13	55	42 or 43
14	65	52 or 53
15	74	61
16	84	71 or 72
17	94	81
18	103	90 or 91
19	113	100 or 101
20	122	109 or 110
21	132	118 or 120
22	142	129

Thanks for any help,
Mike.

mon2 · Post by **mon2** » Tue Feb 13, 2018 6:10 pm

How are these links mated together?

PCB (copper traces)? wiring?

raw point-to-point or through lvds transceivers? Length of interconnects? On same PCB or though connectors / headers?

Perhaps review the signal integrity of the links?

MyKeys · Post by **MyKeys** » Tue Feb 13, 2018 6:45 pm

Hi mon2,

I'm using the standard XUF224 xn file which I don't think configures any external xlinks?
I do have xlink7 wired up to the jtag but again I don't see this being specifically mentioned in the xn file.

Code: Select all

<Links>
        <Link Encoding="5wire" Delays="3clk">
          <LinkEndpoint NodeId="0" Link="7"/>
          <LinkEndpoint NodeId="2" Link="0"/>
        </Link>
        <Link Encoding="5wire" Delays="3clk">
          <LinkEndpoint NodeId="0" Link="4"/>
          <LinkEndpoint NodeId="2" Link="3"/>
        </Link>
        <Link Encoding="5wire" Delays="3clk">
          <LinkEndpoint NodeId="0" Link="6"/>
          <LinkEndpoint NodeId="2" Link="1"/>
        </Link>
        <Link Encoding="5wire" Delays="3clk">
          <LinkEndpoint NodeId="0" Link="5"/>
          <LinkEndpoint NodeId="2" Link="2"/>
        </Link>
        <Link Encoding="5wire">
          <LinkEndpoint NodeId="0" Link="8" Delays="52clk,52clk"/>
          <LinkEndpoint NodeId="1" Link="XL0" Delays="1clk,1clk"/>
        </Link>
        <Link Encoding="5wire">
          <LinkEndpoint NodeId="2" Link="8" Delays="52clk,52clk"/>
          <LinkEndpoint NodeId="3" Link="XL0" Delays="1clk,1clk"/>
        </Link>
      </Links>

Thanks,
Mike.

mon2 · Post by **mon2** » Tue Feb 13, 2018 6:52 pm

Sorry my bad. Confusing xlinks with channels. Not enough coffee.

Have you seen this thread and the comments from Bianco. They may help.

http://www.xcore.com/viewtopic.php?t=1787

MyKeys · Post by **MyKeys** » Tue Feb 13, 2018 7:00 pm

Here I'm using a streaming channel (permanent route) and only sending data in one direction.
Sorry I don't see anything in that thread that would relate to this, is there something I missed?

Thanks

mon2 · Post by **mon2** » Tue Feb 13, 2018 7:21 pm

Just the posted example from that thread was of interest. Assuming that the time for first iteration is longer due to the initial handshake. Have not worked directly with this topic but is interesting to know.

How are the results if the compiler optimization is changed?

MyKeys · Post by **MyKeys** » Wed Feb 14, 2018 10:58 am

Optimisation levels 3 and 2 behave the same as above, level 1 has the same pattern but takes longer.
No optimisations takes longer still but all iterations take the same time presumably because the loop is slow enough to mask the initial setup.

I had assumed that all channels declared as streaming would be configured up front but I think you're right in that it happens on the first comms.
Whilst I can understand this triggering a slight delay on the first iteration, why would this only happen when sending more data than 8 ints?

I wonder what test setup is used to attain the maximum bandwidth possible between these tiles?

johned · Post by **johned** » Wed Feb 14, 2018 12:38 pm

Hi Mike,
One option would be to use the outuint and inuint low level functions in place on <: and :>.
They are defined in xs1.h.
You do not need to specify streaming, for the channel declaration when using outuint and inuint.
Best regards,
John

MyKeys · Post by **MyKeys** » Wed Feb 14, 2018 1:23 pm

Hi John,

Using outuint and inuint with a non streaming channel performs exactly the same as the above code.

Both produce the same instructions:

inuint or streaming :> operator:
in (2r) r1, res[r0] *

outuint or streaming <: operator:
out (r2r) res[r0], r3 *

Did you expect a difference?

Mike.

johned · Post by **johned** » Wed Feb 14, 2018 2:11 pm

Hi Mike,
Thanks for checking. my anecdotal thought was that they would be different however I have just looked at the assembly with a colleague and can confirm that there is no difference.
Best,
john

XUF224 channel comms

XUF224 channel comms

Re: XUF224 channel comms

Re: XUF224 channel comms

Re: XUF224 channel comms

Re: XUF224 channel comms

Re: XUF224 channel comms

Re: XUF224 channel comms

Re: XUF224 channel comms

Re: XUF224 channel comms

Re: XUF224 channel comms