problem with parallel foreach

Tue May 12 08:10:59 PDT 2015

On Tuesday, 12 May 2015 at 14:59:38 UTC, Gerald Jansen wrote:
> I am a data analyst trying to learn enough D to decide whether 
> to use D for a  new project rather than Python + Fortran. I 
> have recoded a non-trivial Python program to do some simple 
> parallel data processing (using the map function in Python's 
> multiprocessing module and parallel foreach in D). I was very 
> happy that my D version ran considerably faster that Python 
> version when running a single job but was soon dismayed to find 
> that the performance of my D version deteriorates rapidly 
> beyond a handful of jobs whereas the time for the Python 
> version increases linearly with the number of jobs per cpu core.
>
> The server has 4 quad-core Xeons and abundant memory compared 
> to my needs for this task even though there are several million 
> records in each dataset. The basic structure of the D program 
> is:
>
> import std.parallelism; // and other modules
> function main()
> {
>     // ...
>     // read common data and store in arrays
>     // ...
>     foreach (job; parallel(jobs, 1)) {
>         runJob(job, arr1, arr2.dup);
>     }
> }
> function runJob(string job, in int[] arr1, int[] arr2)
> {
>     // read file of job specific data file and modify arr2 copy
>     // write job specific output data file
> }
>
> The output of /usr/bin/time is as follows:
>
> Lang Jobs    User  System  Elapsed %CPU
> Py      1   45.17    1.44  0:46.65   99
> D       1    8.44    1.17  0:09.24  104
>
> Py      2   79.24    2.16  0:48.90  166
> D       2   19.41   10.14  0:17.96  164
>
> Py     30 1255.17   58.38  2:39.54  823 * Pool(12)
> D      30  421.61 4565.97  6:33.73 1241
>
> (Note that the Python program was somewhat optimized with numpy 
> vectorization and a bit of numba jit compilation.)
>
> The system time varies widely between repititions for D with 
> multiple jobs (eg. from 3.8 to 21.5 seconds for 2 jobs).
>
> Clearly simple my approach with parallel foreach has some 
> problem(s). Any suggestions?
>
> Gerald Jansen

Have you tried adjusting the workUnitSize argument to parallel? 
It should probably be 1 for such large individual tasks.