problem with parallel foreach

Wed May 13 08:27:31 PDT 2015

On Wednesday, 13 May 2015 at 12:16:19 UTC, weaselcat wrote:
> On Wednesday, 13 May 2015 at 09:01:05 UTC, Gerald Jansen wrote:
>> On Wednesday, 13 May 2015 at 03:19:17 UTC, thedeemon wrote:
>>> In case of Python's parallel.Pool() separate processes do the 
>>> work without any synchronization issues. In case of D's 
>>> std.parallelism it's just threads inside one process and they 
>>> do fight for some locks, thus this result.
>>
>> Okay, so to do something equivalent I would need to use 
>> std.process. My next question is how to pass the common data 
>> to the sub-processes. In the Python approach I guess this is 
>> automatically looked after by pickling serialization. Is there 
>> something similar in D? Alternatively, would the use of 
>> std.mmfile to temporarily store the common data be a 
>> reasonable approach?
>
> Assuming you're on a POSIX compliant platform, you would just 
> take advantage of fork()'s shared memory model and pipes - i.e, 
> read the data, then fork in a loop to process it, then use 
> pipes to communicate. It ran about 3x faster for me by doing 
> this, and obviously scales with the workloads you have(the 
> provided data only seems to have 2.) If you could provide a 
> larger dataset and the python implementation, that would be 
> great.
>
> I'm actually surprised and disappointed that there isn't a 
> fork()-backend to std.process OR std.parallel. You have to use 
> stdc

Okay, more studying...

The python implementation is part of a larger package so it would 
be a fair bit of work to provide a working version. Anyway, the 
salient bits are like this:

from parallel import Pool
def run_job(args):
     (job, arr1, arr2) = args
     # ... do the work for each dataset
def main():
     # ... read common data and store in numpy arrays...
     pool = Pool()
     pool.map(run_job, [(job, arr1, arr2) for job in jobs])