How works internally ParallelForEach

Sat Dec 1 08:42:50 PST 2012

On Saturday, 1 December 2012 at 12:51:27 UTC, thedeemon wrote:
> On Saturday, 1 December 2012 at 11:36:16 UTC, Zardoz wrote:
>
>> The prevois code should work better if i set "total" to be 
>> sahred and hope that D shared vars have nnow the internal 
>> barries working ,or I need to manually use semaphores ?
>
> Probably core.atomic is the way to go. Semaphore is an overkill.

The easiest and fastest way is probably using taskPool.reduce, 
like this:

auto total = taskPool.reduce!"a+b"(
     iota(10_000_000).map!(a => log(a + 1.0)));

writeln(total);

Functions in core.atomic use instructions with lock prefix and 
according to http://www.agner.org/optimize/instruction_tables.pdf 
that "typically costs more than a hundred clock cycles,", so 
calling them for every element will probably slow things down 
significantly. It's best to just avoid accessing same memory from 
multiple threads wherever possible.