Idiomatic way of writing nested loops?

Russel Winder via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Tue Jul 18 08:15:54 PDT 2017


On Tue, 2017-07-18 at 03:36 +0000, Nicholas Wilson via Digitalmars-d-learn
wrote:
> On Monday, 17 July 2017 at 11:07:35 UTC, Anton Fediushin wrote:
> > […]
> > 
> > Also, I have a question about running this in parallel: if I 
> > want to use nested loops with `parallel` from 
> > `std.parallelism`, should I add `parallel` to every loop like 
> > this?
> > ------
> > foreach(a; ["foo", "bar"].parallel) {
> >   foreach(b; ["baz", "foz", "bof"].parallel) {
> >     foreach(c; ["FOO", "BAR"].parallel) {
> >       // Some operations on a, b and c
> >     }
> >   }
> > }
> > ------
> > I am worried about running thousands of threads, because in 
> > this case first `parallel` runs 2 tasks, every task runs 3 
> > tasks and every task runned inside a task runs 2 more tasks.

It is important to separate threads and tasks carefully here: as far as I am
aware the .parallel creates tasks not threads. The only threads are the ones
in the thread pool animatng the tasks. This having the thousands of tasks is
not a problem per se, since these are not threads.

The question of what the best decomposition for parallelism is has to be
determined by benchmarking – guesswork usually gets it wrong.

My prejudice here though is that for a loop structure such as this, unless
the computation at the centre is a biggy, you probably don't want the
.parallel on the inner loop. But I repeat only benchmarking will tell what
the best parallelism decomposition is.

> > So, how to write this in idiomatic D manner and run it _if 
> > possible_ in parallel?
> 
> With regards to parallel, only use it on the outermost loop. 
> Assuming you have more items in the outermost loop than you do 
> threads parallelising more than one loop won't net you any speed.

I am not convinced by this "idiom" of only the outer loop. It may be true
for some cases, but certtainly not all. This is task and thread pool based
parallelism here, not vector parallelism. Without knowing the actual
computational structure of the statements at the centre, there can be no
known best parallelism structure. Experimentation on medium sized data sets
before moving to the real ones is required to get the likely best
performance.


-- 
Russel.
=============================================================================
Dr Russel Winder     t:+44 20 7585 2200   voip:sip:
russel.winder at ekiga.net
41 Buckmaster Road   m:+44 7770 465 077   xmpp:russel at winder.org.uk
London SW11 1EN, UK  w: www.russel.org.uk skype:russel_winder
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.puremagic.com/pipermail/digitalmars-d-learn/attachments/20170718/ae821adc/attachment-0001.sig>


More information about the Digitalmars-d-learn mailing list