Problem with parallel tasks execution

If you have a simple question and just want an answer.
psebastiani
Member++
Posts: 29
Joined: Wed Oct 02, 2013 4:20 pm

Re: Problem with parallel tasks execution

Postby psebastiani » Thu Jul 02, 2020 2:59 pm

I'm sorry, but this solution it's not god for me.
I have a stream data input of 192kHz througt streaming channel and the functions elabora1() and elabora2() take 3.5us each to be executed. They cannot be executed serially.
I solved in this manner with 4 threads:

Code: Select all

        ________     ___________________     ________
       |        |-->|Thread_2:elabora1()|-->|        |
Chan-->|Thread_1|   |___________________|   |        |-->Chan
       |        |    ___________________    |Thread_4|
       |________|-->|Thread_3:elabora2()|-->|________|
                    |___________________| 
with Thread1..4 called into main() function inside par{} statement. Al the threads are executed in parallel and the system work.
The problem is that I have only 2 threads avaiable, and I asked to forum if there is possible to do as explain in the post.
It seemed to me that at least logically it was possible.
User avatar
akp
Respected Member
Posts: 446
Joined: Thu Nov 26, 2015 11:47 pm

Postby akp » Thu Jul 02, 2020 5:48 pm

Have you tried the following?

Collapse Thread1 and Thread2 to Elaborate_Thread1, and Thread 3 and Thread 4 to Elaborate_Thread2. And you use 3 channels

So Elaborate_Thread1
1. Receive 32 bit sample on chan1
2. splits the 32 bit samples to I and Q
3, outputs Q (uint16_t or int16_t)) on chan2 to Elaborate_Thread2
4. computes the I out
5. puts I out (int16_t) on chan2 to Elaborate_Thread2

and Elaborate_Thread2
1. Receives the Q on chan2 from Elaborate_Thread1
2. computes the Q out
3. Receives the I out (int16_t) on chan2
4. Recombines the I and Q
5. puts the output 32 bit sample on chan3

Other thoughts:
1. do you have -O3 set?
2. you can optimize your elabora functions as much as possible, using dual issue assembly if necessary. This can yield significantly improved performance.
3. you could possibly input two samples at a time rather than one, then you could compute two outputs at a time from the same multiplier array. I don't know if that would make it faster.
4. you can set the threads to high priority to get more compute cycles if you need them (i.e. 100MHz/ thread)
User avatar
akp
Respected Member
Posts: 446
Joined: Thu Nov 26, 2015 11:47 pm

Postby akp » Thu Jul 09, 2020 12:45 pm

Any luck?
psebastiani
Member++
Posts: 29
Joined: Wed Oct 02, 2013 4:20 pm

Postby psebastiani » Fri Jul 10, 2020 6:34 am

I must modify the entire code but now the system seems working as expected.
The solution, in any way, is similar your proposed, I moved some function to threads before elabora() and some function to threads after elabora().
Now all parallel threads are in the main() inside par{} statement. In this manner the threads are actually executed in parallel.
I don't use any type of otimizzation, but the first 2 points you mentioned are intresting. What is "dual issue assembly"? Do you have any references?
User avatar
akp
Respected Member
Posts: 446
Joined: Thu Nov 26, 2015 11:47 pm

Postby akp » Fri Jul 10, 2020 12:31 pm

Glad you got it working.

For optimization I am talking about the compiler optimizations. You should try ensuring that it's set to -O3 in the Makefile (though you might be able to get by with -O2 or -Os if your code is fast). These are just normal compiler optimizations, with the exception that the XCORE-200 has a dual-issue pipeline so it can issue two instructions at once. That tends to speed things up but also uses more code space than single-issue mode. So in -O3 the dual-issue mode is enabled, while in -Os it's disabled. It's only really necessary if your system performance is CPU bounded and not I/O bounded.

Dual-issue assembly is just assembly language written in the dual-issue mode. You can hand optimize this to be very fast. Here is an example from XMOS: https://github.com/xmos/lib_ethernet/bl ... i_rx_lld.S

And you should read about dual issue and all the architecture features in the architecture manual for the XCORE-200 https://www.xmos.com/file/xs2-isa-specification/

If you want to get the most performance from your systems
psebastiani
Member++
Posts: 29
Joined: Wed Oct 02, 2013 4:20 pm

Postby psebastiani » Fri Jul 10, 2020 1:55 pm

Thank you very much

Who is online

Users browsing this forum: No registered users and 1 guest