read a port with dynamic bit width and timeout Topic is solved

Technical questions regarding the XTC tools and programming with XMOS.
Post Reply
User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

read a port with dynamic bit width and timeout

Post by dsteinwe »

Hello folks,

so far I could not find a code example or a question in the forum, how to receive data from a port with a timeout and a bit width of 16, 24 or 32 bit under the condition, that the bit width is configurable and can change during runtime. BTW, this may be also interesting to those who want to receive different bit widths over I2S in slave mode. Therefore I asked here: "https://www.xcore.com/viewtopic.php?f=26&t=6855", how to realize this with the function partin(). But I gave myself the answer, why I guess, it is not possible.

Now, I have found an alternative (test) implementation and want to share my idea with you, hopefully it interests you and is useful to you. I have pasted my code below. I am not sure, if it is fast and efficient enough, therefore Ideas for improvement are welcome -- also alternative implementations. I have written following code for the multi audio dev board. You should even test the code with the startkit board. The ports XS1_PORT_1O and XS1_PORT_1P and also the ports XS1_PORT_1M and XS1_PORT_1N needs to be linked with a jumper.

How does the code work:
  • The producer_task writes out the magic sequence "0xF0A05030" on port XS1_PORT_1N.
  • The clock_task produces a very slow clock signal (200ms = 1 period) and output the signal on port XS1_PORT_1O. The clock signal is joined to the clock block named "clk_out" and pulses the output port XS1_PORT_1N. This is configured in the function init().
  • The consumer_task reads the data from the port XS1_PORT_1M. The port is clocked by the clock block "clk_in". The "clk_in" is managed by the input port XS1_PORT_1P. This configuration is done also in the init() function. The consumer_task is configurable, how many bytes should be received to realize a bit widths from 16 to 32 bit. The trick is to buffer the data in the port XS1_PORT_1M only with 8 bits. The counter checks, how many bytes must be read to fit the bit width. The rest is only timeout handling and reseting the state engine (counter/value).
  • If you interrupt the clock connection (XS1_PORT_1O <-> XS1_PORT_1P), you can test the timeout mechanism.
Now, the code:

Code: Select all

#include <platform.h>
#include <xs1.h>
#include <stdio.h>
#include <stdint.h>
#include <xclib.h>

#define DELAY_TICKS 10000000
#define JITTER_TOLERANCE 10000
#define SEQUENCE 0xF0A05030

on tile[0]: clock clk_out = XS1_CLKBLK_1;
on tile[0]: clock clk_in = XS1_CLKBLK_2;
// Join following ports with a jumper
on tile[0]: out port p_clk_out = XS1_PORT_1O;
on tile[0]: out port p_clk_in = XS1_PORT_1P;
// Join following ports with a jumper
on tile[0]: in buffered port:8 p_in = XS1_PORT_1M;
on tile[0]: out buffered port:32 p_out = XS1_PORT_1N;

void producer_task(unsigned count) {
    printf("Please wait\n");
    for (;;) {
        printf("out: %x\n", SEQUENCE);
        p_out <: SEQUENCE;
    }
}

// Read a port with timeout; this makes it possible to add further case blocks i.e.
// to reconfigure something
void consumer_task(unsigned count) {
    unsigned part, value, time, counter;
    timer t;

    counter = 0;

    t :> time;
    time += (16 * count) * DELAY_TICKS + JITTER_TOLERANCE; // "16 *" is required, because one clock cycle is "2 * DELAY_TICKS" and we receive 8 bits => 2 * 8 = 16

    // Why is this necessary?
    clearbuf(p_in);

    for (;;) {
        select {
            case p_in :> part :
                // Stick 8bit parts together
                value |=  (part << (8 * counter));

                // (Count * 8) number of bit read?
                if (counter + 1 >= count) {

                    // Reset timer; to avoid time drift effect read out the current time
                    t :> time;
                    time += (16 * count) * DELAY_TICKS + JITTER_TOLERANCE;

                    // Processing data like wring to channel; the test does only printed out
                    printf("in: %x\n", value);

                    // Reset state engine
                    counter = 0;
                    value = 0;
                } else {
                    counter++;
                }
                break;
            case t when timerafter(time) :> void :
                // This should only happen, if the jumper between XS1_PORT_1O and XS1_PORT_1P is removed
                printf("Timeout\n");

                // Reset state engine
                counter = 0;
                value = 0;

                // Reset timer; to avoid time drift effect read out the current time
                t :> time;
                time += (16 * count) * DELAY_TICKS + JITTER_TOLERANCE;
                break;
        }
    }
}

void clock_task() {
    unsigned value = 1;
    for (;;) {
        p_clk_out <: value;
        value = (value ^1) & 1;
        delay_ticks(DELAY_TICKS);
    }
}

void init(unsigned count) {
    configure_clock_src(clk_out, p_clk_out);
    configure_out_port_no_ready(p_out, clk_out, 0);

    configure_clock_src(clk_in, p_clk_in);
    configure_in_port_no_ready(p_in, clk_in);

    start_clock(clk_in);
    start_clock(clk_out);

    par {
        consumer_task(count);
        producer_task(count);
        clock_task();
    }
}

int main(void) {
    par {
        on tile[0] : init(3); // 1 = 8bit, 2 = 16bit, 3 = 24bit, 4 = 32bit
    }
    return 0;
}

What is missing, but possible to add:
  • There is no sync between input and output. Therefore, if you interrupt the clock connection, the bit pattern between in and out may be shifted.
  • Changing bit width during runtime.


View Solution
ozel
Active Member
Posts: 45
Joined: Wed Sep 08, 2010 10:16 am

Post by ozel »

Hi, this is an interesting problem and somewhat similar to mine!
My hardware is reading a serial LVDS bit stream which is 8b/10b encoded. There is a separate clock signal and for byte alignment reasons a comma symbol is embedded every few bytes into the data stream.

While trying to find the fastest method for reading serial data in chunks of 10 bits, I came across partial transfers. As described in the XMOS XS2 architecture manual, page 37.
For some reason, it turned out to be slower than inputing and processing 32 bits at a time (data rate is at least 40 Mbps, ideally 80). But it may be interesting in your case.
Now I don't remember if the setpsc function can be simply called just before a regular :> input which is part of a select. It could work.
I used these two asm statements embedded into the XC code:
asm volatile("setpsc res[%0], %1"::"r"(TPX3_DataOut), "r"(10)); //set transfer width to 10 bits
asm volatile("in %0, res[%1]" : "=r"(input) : "r"(TPX3_DataOut)); //read TPX3_DataOut 1-bit port
In any case, setpsc has to be called before every input if the transfer width should be less than 32 bit.

At the moment I'm trying to implement faster parsing of the bit stream to find the start of the comma symbol for re-synchronisation. This slightly older XMOS ressource looks very promising (and has several other gems):
http://xcore.github.io/doc_tips_and_tri ... reams.html
User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

Hi ozel,

thanks for your post. What you have written is very interesting.

The document XMOS XS2 architecture manual: https://www.xmos.com/published/xs2-isa-specification describes on page 37 exactly the partial readout. It could be more comfortable than my approach. I will try it, after I have read the first 42 pages of the manual to learn more about the chip internals. And I need to become more familiar with assembler.

Even the other link http://xcore.github.io/doc_tips_and_tri ... reams.html is very useful. I had forgotten it.

I have also thought about using the function endin() and partin() with a timer event. But I guess, that this approach could fail with getting too many bits, because another event processing took too long.
User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

Hi,

I test it with the inline assembler:

Code: Select all

#include <platform.h>
#include <xs1.h>
#include <stdio.h>
#include <stdint.h>
#include <xclib.h>

#define DELAY_TICKS 10000000
#define JITTER_TOLERANCE 1000

on tile[0]: clock clk_out = XS1_CLKBLK_1;
on tile[0]: clock clk_in = XS1_CLKBLK_2;
// Join following ports with a jumper
on tile[0]: out port p_clk_out = XS1_PORT_1O;
on tile[0]: out port p_clk_in = XS1_PORT_1P;
// Join following ports with a jumper
on tile[0]: in buffered port:32 p_in = XS1_PORT_1M;
on tile[0]: out buffered port:32 p_out = XS1_PORT_1N;

void producer_task(unsigned bit_width) {
    unsigned count = bit_width / 8;
    printf("Sending %u bytes at once. Please wait ...\n", count);
    for (;;) {
        unsigned value = 0;
        for (int i = 0; i < 100; i += count) {
            for (int j = 0; j < count; j++) {
                value = value << 8;
                value |= (i+j);
            }
            printf("out: %08x\n", value);
            partout(p_out, bit_width, value);
            value = 0;
        }
    }
}

// Read a port with timeout; this makes it possible to add further case blocks i.e.
// to reconfigure something
void consumer_task(unsigned bit_width) {
    timer t;
    unsigned time, timeout;

    timeout = (2 * bit_width) * DELAY_TICKS + JITTER_TOLERANCE;

    //TODO Test bit width /8
    t :> time;
    time += timeout;

    // Force bit width
    clearbuf(p_in);
    asm("setpsc res[%0], %1" :: "r"(p_in), "r"(bit_width));

    for (;;) {
        select {
            case p_in :> unsigned value :
                value >>= (32 - bit_width);

                // Reset timer; to avoid time drift effect read out the current time
                t :> time;
                time += timeout;

                // Processing data like wring to channel; the test does only printed out
                printf("in: %08x\n", value);

                // Force bit width
                asm("setpsc res[%0], %1" :: "r"(p_in), "r"(bit_width));
                break;
            case t when timerafter(time) :> void :
                // This should only happen, if the jumper between XS1_PORT_1O and XS1_PORT_1P is removed
                printf("Timeout\n");

                // Force bit width
                asm("setpsc res[%0], %1" :: "r"(p_in), "r"(bit_width));

                // Reset timer; to avoid time drift effect read out the current time
                t :> time;
                time += timeout;

                break;
        } // case
    } // loop
}

void clock_task() {
    unsigned value = 1;
    for (;;) {
        p_clk_out <: value;
        value = (value ^1) & 1;
        delay_ticks(DELAY_TICKS);
    }
}

void init(unsigned bit_width) {
    configure_clock_src(clk_out, p_clk_out);
    configure_out_port_no_ready(p_out, clk_out, 0);

    configure_clock_src(clk_in, p_clk_in);
    configure_in_port_no_ready(p_in, clk_in);

    start_clock(clk_in);
    start_clock(clk_out);

    par {
        consumer_task(bit_width);
        producer_task(bit_width);
        clock_task();
    }
}

int main(void) {
    par {
        on tile[0] : init(24);
    }
    return 0;
}
The important and new for me is, that I can combine the inline asm "setpsc" with a regular ":>" operator. That means, I can set a shorter length than the port buffer length for reading, and can wait on a event inside a select-statement by coding "case p_in :> unsigned value :". Its like the partin() function but with supporting port events. Great! Also important is, to shift the received data like this line: "value >>= (32 - bit_width);", because ":>" reads the port buffer width -- in my case 32 bit. Additionally you have ensure, that the inline assembler is called before/after every read, or the buffer width is read instead of a partial read.

I have also tested the instruction "settw", but the XMOS XS2 architecture document is right on page 32, that "custom" bit widths are not supported. Damn. Anyhow, I have a solution and am happy. Thanks to ozel.
ozel
Active Member
Posts: 45
Joined: Wed Sep 08, 2010 10:16 am

Post by ozel »

Hi dsteinwe, that's really cool. I'm glad it helps. In my mind the chance was maybe 50:50 that it would still work with a select.
Actually, I tried partial transfers in combination with one more instruction (setpt) to do a timed input. Maybe because of that it couldn't be part of a select in my case, I don't remember.
But timed inputs are generally bad for high speed data since with buffered ports the buffer length is reduced (or lost).
So I'm trying to avoid those now by parsing for the comma symbol in my bit stream.
I don't know when the volatile keyword is required, not an expert in XMOS assembly, yet. ;) I probably took it from other posts about 'setpsc'.
Partin() is probably using the 'settw' instruction.
User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

Actually, I tried partial transfers in combination with one more instruction (setpt) to do a timed input. Maybe because of that it couldn't be part of a select in my case, I don't remember.
But timed inputs are generally bad for high speed data since with buffered ports the buffer length is reduced (or lost).
So I'm trying to avoid those now by parsing for the comma symbol in my bit stream.
How does a lvds data stream looks like? I haven't found an example, yet. I guess, you do oversampling the data stream, because the clock is encoded into the data stream. Then, your parsing problem may be similar problem to receiving spdif audio signals, that are bi-phase-mark encoded. If true, you should take a look to the spdif rx lib code at github. As far I remember, they use java to generate a state engine in assembler similar to this pattern: http://xcore.github.io/doc_tips_and_tri ... code-space. Generating is easier then writing assembler ;-).

Code: Select all

I don't know when the volatile keyword is required, not an expert in XMOS assembly, yet. ;) I probably took it from other posts about 'setpsc'.
Seems not to be xc specific. Look here: https://stackoverflow.com/questions/144 ... ing-memory
Partin() is probably using the 'settw' instruction.
I guess, too.
User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

Hi folks,

I have figured out, that there is a more simple way to realize it than using inline assembler. Simply use timed port reads. First you determine the port time with a statement like this:

Code: Select all

int port_time;
p_clk_in :> void @ port_time;

or

Code: Select all

int port_time;
p_clk_in when pinsneq(1) :> void @ port_time;

The port time is incremented by one at every clock cycle of the port. That means, you can do a timed port read. If you want to read 24bit, you simply write this:

Code: Select all

port_time += 24;
p_clk_in @ port_time :> unsigned value;
value >>= (32 - 24); // => (buffered port bit width - bits read)
A timed read moves only a reduced count of bits from port buffer. The ">>=" operation aligns the data to read number of bits. That's all.
User avatar
infiniteimprobability
XCore Legend
Posts: 1126
Joined: Thu May 27, 2010 10:08 am
Contact:

Post by infiniteimprobability »

Thanks for sharing this journey. Interesting discussion..

BTW - I am not 100% sure what the rules are but I always include volatile in asm...
User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

Thanks infiniteimprobability. As far I understood the volatile in the asm statement, the difference is, that statements with volatile won't be optimized by the compiler and will be included in the compiled code as the statement is written. Without the the volatile keyword an optimization is allowed. The optimization could lead to faulty execution, if your asm code modifies hardware states and the optimization reorder statements. I cannot appraise, if this is important for xcore processors, but I'm very sure, that it can be important to microprocessors with tons of periphical units like the arm microprocessors.
Post Reply