problem with parallel foreach
John Colvin via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Tue May 12 08:10:59 PDT 2015
On Tuesday, 12 May 2015 at 14:59:38 UTC, Gerald Jansen wrote:
> I am a data analyst trying to learn enough D to decide whether
> to use D for a new project rather than Python + Fortran. I
> have recoded a non-trivial Python program to do some simple
> parallel data processing (using the map function in Python's
> multiprocessing module and parallel foreach in D). I was very
> happy that my D version ran considerably faster that Python
> version when running a single job but was soon dismayed to find
> that the performance of my D version deteriorates rapidly
> beyond a handful of jobs whereas the time for the Python
> version increases linearly with the number of jobs per cpu core.
>
> The server has 4 quad-core Xeons and abundant memory compared
> to my needs for this task even though there are several million
> records in each dataset. The basic structure of the D program
> is:
>
> import std.parallelism; // and other modules
> function main()
> {
> // ...
> // read common data and store in arrays
> // ...
> foreach (job; parallel(jobs, 1)) {
> runJob(job, arr1, arr2.dup);
> }
> }
> function runJob(string job, in int[] arr1, int[] arr2)
> {
> // read file of job specific data file and modify arr2 copy
> // write job specific output data file
> }
>
> The output of /usr/bin/time is as follows:
>
> Lang Jobs User System Elapsed %CPU
> Py 1 45.17 1.44 0:46.65 99
> D 1 8.44 1.17 0:09.24 104
>
> Py 2 79.24 2.16 0:48.90 166
> D 2 19.41 10.14 0:17.96 164
>
> Py 30 1255.17 58.38 2:39.54 823 * Pool(12)
> D 30 421.61 4565.97 6:33.73 1241
>
> (Note that the Python program was somewhat optimized with numpy
> vectorization and a bit of numba jit compilation.)
>
> The system time varies widely between repititions for D with
> multiple jobs (eg. from 3.8 to 21.5 seconds for 2 jobs).
>
> Clearly simple my approach with parallel foreach has some
> problem(s). Any suggestions?
>
> Gerald Jansen
Have you tried adjusting the workUnitSize argument to parallel?
It should probably be 1 for such large individual tasks.
More information about the Digitalmars-d-learn
mailing list