Data loss on port inputs or channel usage Topic is solved

Technical discussions around xCORE processors (e.g. General Purpose (L/G), xCORE-USB, xCORE-Analog, xCORE-XA).
XCore Addict
Posts: 191
Joined: Tue Jul 05, 2016 2:19 pm

Data loss on port inputs or channel usage

Post by DemoniacMilk »

I have two threads A and B with unidirectional data transfer A->B over a streaming channel.
Thread A is sampling a 32 bit buffered 4 bit port and sending the data to thread B.
Thread B receives the data, unzips after receiving every second word, checks for Start/End flags and stores data into an array if needed.
Another Thread C zips bytes and outputs data on a 4 bit port, feeding Thread A over a loopback.

On doing some testing, I realized that some of the values I sent have gone missing. On about 3 MHz clock frequency everything works well. Increasing the frequency will lead to data being lost (~50% loss at 10MHz).
With help of a logic analyzier I can confirm that Thread C is sending out data as intended, up to about 20MHz clock frequency.

The code for reading values from the ports should be okay, I don't think there is a problem here (or?):

Code: Select all

    [[ordered]]   // both events may trigger at the same time, make sure we check transmission status first
        case portInfsync when pinseq(0x1) :> void @ uiTimestampSync:
                uiBytesExpected += 2;   // we expect two more bytes (= 1 more DSP word) for each frame sync
            } else {
                // if we were not expecting bytes, then trnsmission jsut started and we need to sync
                portInData @ uiTimestampSync + 8 :> uiData;  // data starting one bit after sync signal
                rxServerProcessing_c <: uiData;                      // send to processing thread   
                uiBytesExpected += 1;                                    // we expect one more byte
        case uiBytesExpected => portInData :> uiData : // Read a byte if more bytes are expected
            rxServerProcessing_c <: uiData;     // bytes should be aligned to fsync after first timed input
    } // select
} // while
The only problem I see is: the sync signal is going high one clock cycle before the next word transmit. So if we are in a continous transmit, we will have a byte transmit finishing and a sync signal at the same time, hence the [[ordered]] pragma. If [[ordered]] works as i understand, this should make sure that the event for sync is handled first, before the port data event cn reduce the expected byte count to zero.

The processing thread looks like this:

Code: Select all

        case rxServerProcessing_c :> uiRecData:
            uifOddByteCount ^= 1;
            uiaRecDataBuffer[uifOddByteCount] = uiRecData;

            if (!uifOddByteCount) {
                 // UNZIP done here
                // check for flags, store bytes to an array
            } // if (!uifOddByteCount)
    } // select
} // while(1)
Thread B stores received data to an unsigned char array.
For testing, I sent values 0f 0..99.
Test result (bytes received/sent):
3,5 MHz: 100/100
4,2 MHz: 100/100
5,0 MHz: 88/100, first miss: byte 34
7,1 MHz: 64/100, first miss: byte 14
10 MHz: 50 /100, first miss: byte 10, Bytes in wrong order: 0 1 2 3 4 5 6 7 8 9 19 11 21 13 27 23 29 25 68 62

Im wondering what might be happening.

Streaming channels are asynchronous, are they buffered too? If yes, how much data can they hold? What happens if I send data to a streaming channel faster than the retrieving end can handle the input?
Do you see any problems on the approach for the data sampling thread?

Edit: I did some testing with gprof. As I use events to read data from the channel I hoping to see some wait time on low clock frequencies (hundreds of KHz), with the wait time approaching zero on increasing frequencies, when computing cannot keep up anymore. But I couldn't figure out any differences in the results. Does a read action on a streaming channel not wait for input data either? If not, what does it do if the code is meant to use data that has been read?

some time measurements revealed the problem:

Code: Select all

                timer_ti :> uiTimeStart;
                // UNZIP done here
                // check for flags, store bytes to an array
                timer_ti :> uiTimeEnd;
                printf("z%u sc%u\n", (uiTimeZip - uiTimeStart),(uiTimeEnd - uiTimeZip));

lead to a compute time of 3.37us for 2x32 bit being received. On a 4 bit port that equals 16 clock cycles, so if I don't want to sample data faster than it can be computed,my clock frequency is limited to about 4,7 MHz.

View Solution