task parallelize dirEntries

Arun Chandrasekaran via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Fri Aug 11 17:31:29 PDT 2017


On Friday, 11 August 2017 at 21:58:20 UTC, Johnson wrote:
> Just a thought, maybe the GC isn't cleaning up quick enough? 
> You are allocating and md5 digest each iteration.
>
> Possibly, an opitimization is use use a collection of md5 
> hashes and reuse them. e.g., pre-allocate 100(you probably only 
> need as many as the number of parallel loops going) and then 
> attempt to resuse them. If all are in use, wait for a free one. 
> Might require some synchronization.

John, thanks. That was it. md.d has nifty function that is 
straightforward than the OOP version.

```
void main(string[] args)
{
     foreach (d; parallel(dirEntries(args[1], 
SpanMode.depth).filter!(f => f.isFile), 1))
     {
         auto data = cast(const(ubyte)[]) read(d.name);
         auto hash = md5Of(data);
         import std.array;
         string[] t = split(d.name, '/');
         writeln(toHexString(hash), "  ", t[$-1]);
     }
}
```

Also I expected the performance to be faster than `md5sum`. 
However, that was not the case. Please see below. Is there anyway 
to optimize this further?

```
11-08-2017 17:22:54 vaalaham ~/code/d/d-mpmc-sample
$ time find /home/arun/downloads/boost_1_64_0/ -type f | xargs 
md5sum >/dev/null 2>&1

real    0m1.124s
user    0m0.952s
sys     0m0.208s
11-08-2017 17:23:16 vaalaham ~/code/d/d-mpmc-sample
$ ldc2 pmd.d -O3
11-08-2017 17:23:31 vaalaham ~/code/d/d-mpmc-sample
$ time ./pmd ~/downloads/boost_1_64_0 > /dev/null

real    0m0.499s
user    0m1.596s
sys     0m0.580s
11-08-2017 17:23:37 vaalaham ~/code/d/d-mpmc-sample
$
```

strace showed lots of futex exchanges. Why would that be?


More information about the Digitalmars-d-learn mailing list