Simple parallel foreach and summation/reduction

Sat Sep 22 02:13:58 UTC 2018

On Friday, 21 September 2018 at 12:15:59 UTC, Ali Çehreli wrote:
> On 09/21/2018 12:25 AM, Chris Katko wrote:
>> On Thursday, 20 September 2018 at 05:51:17 UTC, Neia Neutuladh 
>> wrote:
>>> On Thursday, 20 September 2018 at 05:34:42 UTC, Chris Katko 
>>> wrote:
>>>> All I want to do is loop from 0 to [constant] with a for or 
>>>> foreach, and have it split up across however many cores I 
>>>> have.
>>>
>>> You're looking at std.parallelism.TaskPool, especially the 
>>> amap and reduce functions. Should do pretty much exactly what 
>>> you're asking.
>>>
>>> auto taskpool = new TaskPool();
>>> taskpool.reduce!((a, b) => a + b)(iota(1_000_000_000_000L));
>> 
>> I get "Error: template instance `reduce!((a, b) => a + b)` 
>> cannot use local __lambda1 as parameter to non-global template 
>> reduce(functions...)" when trying to compile that using the 
>> online D editor with DMD and LDC.
>> 
>> Any ideas?
>
> You can use a free-standing function as a workaround, which is 
> included in the following chapter that explains most of 
> std.parallelism:
>
>   http://ddili.org/ders/d.en/parallelism.html
>
> That chapter is missing e.g. the newly-added fold():
>
>   https://dlang.org/phobos/std_parallelism.html#.TaskPool.fold
>
> Ali

Okay... so I've got it running. The problem is, it uses tons of 
RAM. In fact, proportional to the working set.

T test(T)(T x, T y)
	{
	return x + y;
	}

double monte(T)(T x)
	{
	double v = uniform(-1F, 1F);
	double u = uniform(-1F, 1F);
	if(sqrt(v*v + u*u) < 1.0)
		{
		return 1;
		}else{
		return 0;
		}
	}

	auto taskpool = new TaskPool();
	sum = taskpool.reduce!(test)(
	taskpool.amap!monte(
		iota(num)
		)	);	
	taskpool.finish(true);

1000000 becomes ~8MB
10000000 becomes 80MB
100000000, I can't even run because it says "Exception: Memory 
Allocation failed"