Concurrency architecture for D2

Fri Jan 8 16:08:13 PST 2010

Fri, 08 Jan 2010 23:12:38 +0000, dsimcha wrote:

> == Quote from Walter Bright (newshound1 at digitalmars.com)'s article
>> Sure. Except that implicit parallelism is inherently unsound unless the
>> type system can support it.
>> I'll go further than that. Any language that supports implicit sharing
>> of mutable data is inherently unsound for multithreaded programming.
> 
> One thing that I think needs to be considered in the concurrency
> architecture is the case of performance-critical massive data
> parallelism.  In these cases concurrency is actually not very hard and a
> simple parallel foreach loop covers a lot of cases.  As far as safety,
> the amount of code being executed inside a parallel foreach loop is
> generally small enough that it's easy to reason about, and thus it's ok
> not to have any hard static guarantees and leave safety up to the
> programmer, as long as the programmer understands at least the basics of
> concurrency.
> 
> I would like to see a situation where OpenMP/ParallelFuture-style
> concurrency is still implementable in D without unreasonable performance
> or syntactic overhead after the new concurrency system is fully in
> place.

These systems also solve different kinds of problems. MMX/SSE/Altivec and 
now GPU hardware solves problems where you run a single operation on a 
large set of data simultaneously. The communication is really cheap 
because e.g. in SSE data is in "ordinary" registers which are the fastest 
memory units available.

Message passing system scales well if you have many computers that form a 
computing cluster. It also performs well when the computation can be 
split into distinct tasks which run in separate threads and communicate 
rarely. Unless messages are translated into native SIMD primitives in 
data parallel cases, message passing will be much slower when you do that 
kind of processing. Parallel loops don't scale that well on multiple 
computing nodes - the computation needs to be split into larger tasks to 
minimize communication delays since nodes simply can't share the same 
thread state.