Simple parallel foreach and summation/reduction

Thu Sep 20 05:34:42 UTC 2018

All I want to do is loop from 0 to [constant] with a for or 
foreach, and have it split up across however many cores I have.

     ulong sum;
     foreach(i; [0 to 1 trillion])
       {
       //flip some dice using
       float die_value = uniform(0F,12F);
       if(die_value > [constant]) sum++;
       }
     writeln("The sum is %d", sum);

However, there are two caveats.:

  - One: I can't throw a range of values into an array and foreach 
on that like many examples use. Because 1 trillion (counting from 
zero) might be a little big for an array. (I'm using 1 trillion 
to illustrate a specific bottleneck / problem form.)

  - I want to merge the results at the end.

Which means I either need to use mutexes (BAD. NO. BOO. HISS.)  
or each "thread" would need to know if it's separate, and then 
store their sums in, say, a thread[#].sum variable and then once 
all were completed, add those sums together.

I know this is an incredibly simple conceptual problem to solve. 
So I feel like I'm missing some huge, obvious, answer for doing 
it elegantly in D.

And this just occurred to me, if I had a trillion foreach, will 
that make 1 trillion threads? What I want is, IIRC, what OpenMP 
does. It divides up your range (blocks of sequential numbers) by 
the number of threads. So domain of [1 to 1000] with ten threads 
would become workloads on the indexes of [1-100], [101-200], 
[201-300], and so on. for each CPU. They each get a 100 element 
chunk.

So I guess foreach won't work here for that, will it? Hmmm...

  ----> But again, conceptually this is simple: I have, say, 1 
trillion sequential numbers. I want to assign a "block" (or 
"range") to each CPU core. And since their math does not actually 
interfer with each other, I can simply sum each core's results at 
the end.

Thanks,
--Chris