Parallelizing code -- design problem in networkish code

Thu Dec 13 19:30:21 PST 2012

Parallelism, concurrency, and multi-threading in general are fascinating 
topics. I can't claim that I have extensive experience on these topics 
but I have written the following two chapters after studying the 
std.parallelism and std.concurrency modules:

   http://ddili.org/ders/d.en/parallelism.html

   http://ddili.org/ders/d.en/concurrency.html

Still, when I tried some of those ideas, I did not always see the 
performance gains that I had expected, at least with a particular test 
program. After reading articles like the following, now I understand how 
perhaps counter-intuitively, single-threading can be way faster than 
multi-threading, including the CAS-style lock-free multi-threading. The 
following has been posted to D forums recently:

   http://martinfowler.com/articles/lmax.html

Sorry to ramble on about this topic. :) I am thinking out loud that next 
time I will try to make the actual processing of data single-threaded as 
I have learned from the Disruptor architecture and see whether that 
makes it faster.

On 12/13/2012 01:32 PM, Charles Hixson wrote:
 > I'm trying to parallelize some code which is essentially a network of
 > cells with an index. I can make the cells immutable,

Because you say "parallelize", I don't think you need to make your data 
immutable at all, because parallelism requires that the data is not shared.

Even if you need to share the data, if you are doing message passing 
concurrency, then you can safely cast your immutable data to mutable as 
long as you know that there is only one thread that accesses it at a 
given time. Message-passing makes it easy to reason about.

But I think 'shared' is likely better than 'immutable' in your case.

Not knowing your exact situation, I would recommend the simplest option 
first, which is std.parallelism:

import std.stdio;
import std.parallelism;

void main()
{
     auto array = [ 1, 2 ];

     foreach (element; parallel(array)) {
         // This block is executed on the elements in parallel
         writeln(element);
     }
}

std.parallelism works on ranges, which makes it possible to parallelize 
operations lazily, which enables making the objects as needed. Here is a 
thousand ints that are processed in parallel, which never need to live 
in an actual array:

import std.stdio;
import std.parallelism;
import std.range;

void main()
{
     auto numbers = iota(1000);

     foreach (element; parallel(numbers)) {
         writeln(element);
     }
}

In case you are not already familiar, ranges are fundamentally a very 
simple concept. There is the following chapter on ranges:

   http://ddili.org/ders/d.en/ranges.html

Ali