Threading methodology

Sun Dec 8 02:48:40 PST 2013

Am Sat, 07 Dec 2013 17:53:06 +0100
schrieb "Frustrated" <c1514843 at drdrb.com>:

> I have to process n arrays in some partial order. Instead of all 
> working only on the n arrays and reusing them, [...]

Wait, what partial order and how is it relevant? Who is "all"
in "all working"? Why "only" the n arrays, I thought those are
all?

> if I "duplicate"
> them(effectively write once read many) does that make things 
> simpler and allow threading to be used more efficiently?
>
> Basically, instead of having to worry about threads waiting on 
> other threads when they are writing the data I can just have each 
> thread write to it's own duplicate array(although hopefully won't 
> have to duplicate the buffer as that should not be necessary?). 
> This should then make it much easier to deal because there is 
> essentially no write contention.
> 
> If I'm not mistaken this is basically the idea of TLS? (read many 
> write once time idea so that one doesn't have to lock memory 
> which stalls other threads in some cases and/or uses expensive 
> locks?)

The idea of TLS in D as I understand it, is that your global
variables are not shared between threads by default as is the
case in C. So as you point out you need not lock them since each
thread has its own copy.

> The case I gave above is a little more simplified than the actual 
> issue but the idea is simply that by writing to a threads own 
> buffer reduces inter-threading issues and increases performance. 
> One issue I have is how to combine the data back.
> 
> In fact, the real case is basically n arrays. Each array is 
> processed to create m arrays, which are processed to create l 
> arrays and so on. Not all processing of each array takes the same 
> time and I would like to multithread the overall processing by 
> allowing threads to pick which (sub)arrays need processing and 
> process them, but in their own space, and at the end send them 
> back to the main thread. (But one thread may need access to 
> another's buffer but only for reading so it shouldn't be a 
> problem)
> 
> This uses a lot more memory but if there are N threads and n 
> arrays and N > n, no thread will be wasted. (in my case I would 
> have n*m*l which will almost always be larger than N so threads 
> will always be doing something except towards the end)
> 
> Another way to see the problem is to think of a digraph where 
> each node processes all connected to it. All independent nodes 
> must be processed first which then works itself up to the most 
> dependent nodes. Every processed node uses a new buffer to write 
> to instead of writing to the same buffer which locks that buffer 
> and all other threads that may be trying to read from it(which is 
> the key in my mind that this method is better except it requires 
> more memory).
> 
> I will of course mark nodes as being processed or not so threads 
> don't work on unprocessed buffers and this allows dependent nodes 
> to start up once all there dependencies become processed.
> 
> Does any of this make sense?

It is a little clearer now, but I didn't fully understand it
yet. The sub-arrays need to be processed before the parent
arrays can be completed? When does a thread need access to
another thread's buffer? What if you have only 1 CPU core and
there is only one thread and one buffer? Does your algorithm
break? :p

-- 
Marco