problem with parallel foreach
Gerald Jansen via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed May 13 08:27:31 PDT 2015
On Wednesday, 13 May 2015 at 12:16:19 UTC, weaselcat wrote:
> On Wednesday, 13 May 2015 at 09:01:05 UTC, Gerald Jansen wrote:
>> On Wednesday, 13 May 2015 at 03:19:17 UTC, thedeemon wrote:
>>> In case of Python's parallel.Pool() separate processes do the
>>> work without any synchronization issues. In case of D's
>>> std.parallelism it's just threads inside one process and they
>>> do fight for some locks, thus this result.
>>
>> Okay, so to do something equivalent I would need to use
>> std.process. My next question is how to pass the common data
>> to the sub-processes. In the Python approach I guess this is
>> automatically looked after by pickling serialization. Is there
>> something similar in D? Alternatively, would the use of
>> std.mmfile to temporarily store the common data be a
>> reasonable approach?
>
> Assuming you're on a POSIX compliant platform, you would just
> take advantage of fork()'s shared memory model and pipes - i.e,
> read the data, then fork in a loop to process it, then use
> pipes to communicate. It ran about 3x faster for me by doing
> this, and obviously scales with the workloads you have(the
> provided data only seems to have 2.) If you could provide a
> larger dataset and the python implementation, that would be
> great.
>
> I'm actually surprised and disappointed that there isn't a
> fork()-backend to std.process OR std.parallel. You have to use
> stdc
Okay, more studying...
The python implementation is part of a larger package so it would
be a fair bit of work to provide a working version. Anyway, the
salient bits are like this:
from parallel import Pool
def run_job(args):
(job, arr1, arr2) = args
# ... do the work for each dataset
def main():
# ... read common data and store in numpy arrays...
pool = Pool()
pool.map(run_job, [(job, arr1, arr2) for job in jobs])
More information about the Digitalmars-d-learn
mailing list