Thread Disjointness Rules for Arrays

richard · Post by **richard** » Tue Mar 02, 2010 12:28 pm

This is definitely a bug. You shouldn't rely on the current compiler behaviour.

Heater wrote:Problem for me is that I still don't have an XMOS device to test it on so it may not actually work. However I have a very strong suspicion that it will. Some please try it.

I also suspect it will work, at least in this simple case. However there are optimisations in the XC compiler that rely on thread disjointness. If thread disjointness is violated then these optimisations would change the behaviour of the program. For example the compiler may choose to cache the value of a global variable in a register. If one thread is accessing the global this optimisation is valid. However if the global is used by multiple threads it might mean that one thread wouldn't see the writes by the other thread, breaking the program.

Post by **lilltroll** » Tue Mar 02, 2010 2:36 pm

Heater wrote:Ah but it does !

I know in the manual it only mentions turning off bounds checking and not thread sharing. But one could interpret what the manual says to mean not checking that the entire array is out of bounds to a thread because the array can be in use by another thread:)

Problem for me is that I still don't have an XMOS device to test it on so it may not actually work. However I have a very strong suspicion that it will. Some please try it.

P.S. One might want to put endless loops around those accesses to "unsafe" making them in to long lived threads.

I already used the #pragma - and the result is "use of `W' violates parallel usage rules"
Is it a .XC file - not an .C file ?? :?

Does your program compile if you call the macs macro in the treads?

Code: Select all

#include <xs1.h>
{h,l}=macs(x, W[1],0,0);

richard · Post by **richard** » Tue Mar 02, 2010 3:22 pm

Would it be possible to use double buffering of coefficients? For example:

Code: Select all

unsigned char buf0[BUF_SIZE];
unsigned char buf1[BUF_SIZE];
init_coefficients(buf0);
while (1) {
  par {
    update_coefficients(buf1)
    filter_data(buf0)
  }
  par {
    update_coefficients(buf0)
    filter_data(buf1)
  }
}

This keeps the compiler happy and avoids the need to copy coefficients over channels.

wibauxl · Post by **wibauxl** » Thu Mar 04, 2010 8:27 pm

Yes this would be a nice way of doing it but is the RAM shared accross cores?

To me, you have to use channels to pass data between threads, mainly because when the threads are dispersed accross cores, they might not see the data of the other threads.

Woody · Post by **Woody** » Fri Mar 05, 2010 10:47 am

If you write your code relying on threads sharing memory in some way then you restrict them to running on the same core and this is likely to be problematic, and a rather inelegant solution. You really should be passing data between threads over channels.

Post by **lilltroll** » Sat Apr 24, 2010 1:29 am

Well, maybe a little bit ugly, but I found a way to cheat the compiler with a struct in XC.

if you use const prior struct, the compiler will not complain if you pass it as a reference to several functions within a par.
On the other hand if the fields in the struct are non constants, you can change the values in the subfunctions.

What about the hardware? Does a XMOS-core have one ALU that is time-shared between all threads running on that core in 4X (meaning that nothing is calculated in parrallell) ?

Post by **lilltroll** » Sat Apr 24, 2010 1:47 am

Woody wrote:If you write your code relying on threads sharing memory in some way then you restrict them to running on the same core and this is likely to be problematic, and a rather inelegant solution. You really should be passing data between threads over channels.

Elegant or not:
I will go for the solution that gives me the most MACS/sec without glitches.
I would always cluster the FIR-filter and the filterupdate-code as close as possible on the same core.

On a X64 I would try to split the FIR-fiter instead, and distribute it over the cores (But I'm just guessing now).

Woody · Post by **Woody** » Mon Apr 26, 2010 9:33 am

lilltroll wrote:Well, maybe a little bit ugly, but I found a way to cheat the compiler with a struct in XC.

...

Elegant or not:
I will go for the solution that gives me the most MACS/sec without glitches.
I would always cluster the FIR-filter and the filterupdate-code as close as possible on the same core.

I used shared memory in the classD application. What I find to be the simplest solution is to have an in line assember macro which casts from one type to another. You can then cast from any 32bit type to any other.

Code: Select all

#define CAST32(dst, src)       asm("mov %0, %1"     : "=r"(dst) : "r"(src));
...
CAST32(pwmSampleFifoLptrPtr, pwmSampleFifoLptr);

----
Health Warning
----
It's worth restating though that this is an advanced method of data transfer, useful if you really know what you're doing and you're trying to get the last ounce of processing power out of the XCore. To ensure you get results with short development times and code that is more easily reusable, using channels is preferred.
----
End of Health Warning!
----

Post by **lilltroll** » Mon Apr 26, 2010 9:47 am

Nice ONE!

I try and learn that one.

... and I always start with the nice channels.

This case is special since it's becomes a very high bandwith on the datatransfer to the array.

daveg · Post by **daveg** » Tue Oct 26, 2010 5:44 pm

The compiler error allowing const struct references will be disallowed in the next release, so I'd recommend against using this. Use channels or C code instead if you really need shared memory.

Thread Disjointness Rules for Arrays

Re: Thread Disjointness Rules for Arrays

Re: Thread Disjointness Rules for Arrays

Re: Thread Disjointness Rules for Arrays

Re: Thread Disjointness Rules for Arrays

Re: Thread Disjointness Rules for Arrays

Re: Thread Disjointness Rules for Arrays

Re: Thread Disjointness Rules for Arrays

Re: Thread Disjointness Rules for Arrays

Re: Thread Disjointness Rules for Arrays

Re: Thread Disjointness Rules for Arrays