Threading methodology
Frustrated
c1514843 at drdrb.com
Sat Dec 7 08:53:06 PST 2013
I have to process n arrays in some partial order. Instead of all
working only on the n arrays and reusing them, if I "duplicate"
them(effectively write once read many) does that make things
simpler and allow threading to be used more efficiently?
Basically, instead of having to worry about threads waiting on
other threads when they are writing the data I can just have each
thread write to it's own duplicate array(although hopefully won't
have to duplicate the buffer as that should not be necessary?).
This should then make it much easier to deal because there is
essentially no write contention.
If I'm not mistaken this is basically the idea of TLS? (read many
write once time idea so that one doesn't have to lock memory
which stalls other threads in some cases and/or uses expensive
locks?)
The case I gave above is a little more simplified than the actual
issue but the idea is simply that by writing to a threads own
buffer reduces inter-threading issues and increases performance.
One issue I have is how to combine the data back.
In fact, the real case is basically n arrays. Each array is
processed to create m arrays, which are processed to create l
arrays and so on. Not all processing of each array takes the same
time and I would like to multithread the overall processing by
allowing threads to pick which (sub)arrays need processing and
process them, but in their own space, and at the end send them
back to the main thread. (But one thread may need access to
another's buffer but only for reading so it shouldn't be a
problem)
This uses a lot more memory but if there are N threads and n
arrays and N > n, no thread will be wasted. (in my case I would
have n*m*l which will almost always be larger than N so threads
will always be doing something except towards the end)
Another way to see the problem is to think of a digraph where
each node processes all connected to it. All independent nodes
must be processed first which then works itself up to the most
dependent nodes. Every processed node uses a new buffer to write
to instead of writing to the same buffer which locks that buffer
and all other threads that may be trying to read from it(which is
the key in my mind that this method is better except it requires
more memory).
I will of course mark nodes as being processed or not so threads
don't work on unprocessed buffers and this allows dependent nodes
to start up once all there dependencies become processed.
Does any of this make sense?
More information about the Digitalmars-d-learn
mailing list