task parallelize dirEntries
Johnson via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Fri Aug 11 14:58:20 PDT 2017
On Friday, 11 August 2017 at 21:33:51 UTC, Arun Chandrasekaran
wrote:
> I've modified the sample from tour.dlang.org to calculate the
> md5 digest of the files in a directory using std.parallelism.
>
> When I run this on a dir with huge number of files, I get:
>
> core.exception.OutOfMemoryError at src/core/exception.d(696):
> Memory allocation failed
>
> Since dirEntries returns a range, I thought
> std.parallelism.parallel can make use of that without loading
> the entire file list into the memory.
>
> What am I doing wrong here? Is there a way to achieve what I'm
> expecting?
>
> ```
> import std.digest.md;
> import std.stdio: writeln;
> import std.file;
> import std.algorithm;
> import std.parallelism;
>
> void printUsage()
> {
> writeln("Loops through a given directory and calculates the
> md5 digest of each file encountered.");
> writeln("Usage: md <dirname>");
> }
>
> void safePrint(T...)(T args)
> {
> synchronized
> {
> import std.stdio : writeln;
> writeln(args);
> }
> }
>
> void main(string[] args)
> {
> if (args.length != 2)
> return printUsage;
>
> foreach (d; parallel(dirEntries(args[1],
> SpanMode.depth).filter!(f => f.isFile), 1))
> {
> auto md5 = new MD5Digest();
> md5.reset();
> auto data = cast(const(ubyte)[]) read(d.name);
> md5.put(data);
> auto hash = md5.finish();
> import std.array;
> string[] t = split(d.name, '/');
> safePrint(toHexString!(LetterCase.lower)(hash), " ",
> t[$-1]);
> }
> }
> ```
Just a thought, maybe the GC isn't cleaning up quick enough? You
are allocating and md5 digest each iteration.
Possibly, an opitimization is use use a collection of md5 hashes
and reuse them. e.g., pre-allocate 100(you probably only need as
many as the number of parallel loops going) and then attempt to
resuse them. If all are in use, wait for a free one. Might
require some synchronization.
More information about the Digitalmars-d-learn
mailing list