David Simcha's std.parallelism

Sun Jan 2 09:29:12 PST 2011

dsimcha:

> Andrei:
> > * I think it does make sense to evaluate a parallel map lazily by using
> > a finite buffer. Generally map looks the most promising so it may be
> > worth investing some more work in it to make it "smart lazy".
> 
> Can you elaborate on this?  I'm not sure what you're suggesting.

I think Andrei is talking about vectorized lazyness, I have explained the idea here two times in past. This isn't a replacement for the fully eager parallel map. Instead of computing the whole resulting array in parallel, you compute only a chunk of the result, in parallel, and you store it. When the code that uses the data lazily has exhausted that chunk, the lazy parallel map computes the next chunk and stores it inside, and so on.

Each chunk is large enough that performing it in parallel is advantageous, but not large enough to require a lot of memory.

An option is even self-tuning, let the library find the chunk size by itself, according to how much time each item computation (mapping function call) requires (this is an optional behaviour).

If you have a read-only memory mapped file that is readable from several threads in parallel, the map may perform some operation on the lines/records of the file. If the file is very large or huge, and you want to collect/summarize (reduce) the results of the mapping functions in some way, then a lazy parallel map is useful :-) This looks like a special case, but lot of heavy file processing (1 - 5000 gigabytes of data) can be done with this schema (map-reduce).

Bye,
bearophile