Threading methodology

Sat Dec 7 08:53:06 PST 2013

I have to process n arrays in some partial order. Instead of all 
working only on the n arrays and reusing them, if I "duplicate" 
them(effectively write once read many) does that make things 
simpler and allow threading to be used more efficiently?

Basically, instead of having to worry about threads waiting on 
other threads when they are writing the data I can just have each 
thread write to it's own duplicate array(although hopefully won't 
have to duplicate the buffer as that should not be necessary?). 
This should then make it much easier to deal because there is 
essentially no write contention.

If I'm not mistaken this is basically the idea of TLS? (read many 
write once time idea so that one doesn't have to lock memory 
which stalls other threads in some cases and/or uses expensive 
locks?)

The case I gave above is a little more simplified than the actual 
issue but the idea is simply that by writing to a threads own 
buffer reduces inter-threading issues and increases performance. 
One issue I have is how to combine the data back.

In fact, the real case is basically n arrays. Each array is 
processed to create m arrays, which are processed to create l 
arrays and so on. Not all processing of each array takes the same 
time and I would like to multithread the overall processing by 
allowing threads to pick which (sub)arrays need processing and 
process them, but in their own space, and at the end send them 
back to the main thread. (But one thread may need access to 
another's buffer but only for reading so it shouldn't be a 
problem)

This uses a lot more memory but if there are N threads and n 
arrays and N > n, no thread will be wasted. (in my case I would 
have n*m*l which will almost always be larger than N so threads 
will always be doing something except towards the end)

Another way to see the problem is to think of a digraph where 
each node processes all connected to it. All independent nodes 
must be processed first which then works itself up to the most 
dependent nodes. Every processed node uses a new buffer to write 
to instead of writing to the same buffer which locks that buffer 
and all other threads that may be trying to read from it(which is 
the key in my mind that this method is better except it requires 
more memory).

I will of course mark nodes as being processed or not so threads 
don't work on unprocessed buffers and this allows dependent nodes 
to start up once all there dependencies become processed.

Does any of this make sense?