Matching Clocks to Data with high speed transfer

jmg · Post by **jmg** » Fri Mar 23, 2012 4:32 am

I am doing an exercise on joining XMOS to a FPGA, and need highest speed transfers, with fewest wires.
Some of this info is 12b from ADC, and a QuadSPI link appealed.

This will likely have two links, one running at close to 100%, and another link that will ideally self-pace by just not sending CLK edges, when data is not valid. ( ie rather like SPI )

It seems however, that XMOS lacks this implicit Clock edge gating, and says cryptic things like
The data driven on one edge continues to be driven on subsequent edges.
and I see SPI libraries that actually use transmitted data as a clock (?!), and other threads about extra clocks at high speed, when using stop_clock()

I can find mention of partout(), but no examples of the speed of that, and even the description does not actually state that the clocks produced match the data ?

Q: Can I use partout(), to send 24 bits over a 4w port, and produce the needed 6 clock edges whilst doing so ?
If yes, what is the speed limit of this, and if I call it multiple times, does the clock merge to have no gaps, or is there a ?? cycle delay overhead in this ?

I do not really want to send 32 bits with 8 discarded, as that wastes bandwidth, and adds complexity.

segher · Post by **segher** » Fri Mar 23, 2012 1:52 pm

jmg wrote:I am doing an exercise on joining XMOS to a FPGA, and need highest speed transfers, with fewest wires.
Some of this info is 12b from ADC, and a QuadSPI link appealed.

This will likely have two links, one running at close to 100%, and another link that will ideally self-pace by just not sending CLK edges, when data is not valid. ( ie rather like SPI )

It seems however, that XMOS lacks this implicit Clock edge gating,

Yes, there is no way to have a port that is outputting a clock
stop outputting that clock automatically when there is no data
transmitting on another port. This makes implementing SPI (at
high speed) rather hard.

and says cryptic things like
The data driven on one edge continues to be driven on subsequent edges.

This means that a port that is in output mode will not switch to
input mode (tristate, if you will) when it has no further data
to transmit. If it keeps outputting (and it does), it has to
output _something_; it keeps outputting the last value you asked
it to.

and I see SPI libraries that actually use transmitted data as a clock (?!),

You can output a clock as data: it's just a signal that switches
from high to low all the time, after all. This is the easiest
way to make a gated clock.

The trick used here is to feed that "data clock" back into a
clock block, and use that clock as the port clock for other
ports. That however will make those port pins' signals delayed
relative to the "data clock": the best I've achieved was a
delay of 6 system clocks (400MHz).

and other threads about extra clocks at high speed, when using stop_clock()

When you try to gate your clock using stop_clock() (or alternatively,
by doing things to the port outputting that clock), you have to keep
in mind that this will happen synchronously to the instructions
doing this, so (with a fast clock) you have to do this slightly
_before_ you want it to take effect, and you have to know the exact
timing of your program -- which means you have to write it in
assembler, you have to know exactly how many threads are running
and how they are scheduled, etc. You probably want to stay away
from this ;-)

I can find mention of partout(), but no examples of the speed of that, and even the description does not actually state that the clocks produced match the data ?

partout(), which is OUTPW, is no different from the timings you
get with a normal OUT: the only difference is the number of bits
it shifts into the shift register.

Q: Can I use partout(), to send 24 bits over a 4w port, and produce the needed 6 clock edges whilst doing so ?

You can use partout to send 24 bits over a 4-wide port. This
does not produce clock edges: rather, the clock edges are an
_input_ to the port, the port outputs 4 new bits on every falling
edge of the clock (and keeps on outputting the last 4 bits if
you do not give it new data in time).

If yes, what is the speed limit of this, and if I call it multiple times, does the clock merge to have no gaps, or is there a ?? cycle delay overhead in this ?

There is no difference with normal 32-bit OUTs, as long as your
program can keep up (which it has to with 32-bit OUTs as well,
it has more time than though). The clock *never* has gaps.

I do not really want to send 32 bits with 8 discarded, as that wastes bandwidth, and adds complexity.

All that said, your situation is much less complicated. Just
don't do SPI! You're probably best served using a plain dull
handshaken port (or you can leave out the handshake signal in
one or both directions, if you can guarantee both sides can keep
up with that).

Matching Clocks to Data with high speed transfer

Matching Clocks to Data with high speed transfer

Re: Matching Clocks to Data with high speed transfer