Revamped concurrency API

Mon Oct 12 15:21:37 PDT 2009

>> If anyone has ideas, suggestions, and code to help defining a new 
>> concurrency API for D, please post.

I think that the correct way to handle concurrency is through a 
library, D is flexible enough, so that a library can be used, and then 
one can even write special handlers for special cases.

In blip (http://dsource.org/project/blip) I actually did exactly that, 
in D 1.0 for two reasons:
1) I need 64 bits
2) concurrency is complex, and I had a hard time already with bugs in 
the stable branch, furthermore the possibility of breakage in very 
sensitive code that I don't want to touch too often by language changes 
was not appealing to me.

That said there is a feature that would help much D 1.0, and that I 
would really love to have.
I know that D 1.0 is frozen and stuff... but I will try to ask all the same...
real closures from delegates when you put a "new" in front of them
	new delegate(ref int x){x+=y}
would create a closure (allocating y).

I know that D 2.0 has basically that by default (and scope to get the 
previous behavior), but as I said D 2.0 is not yet an option for me (I 
want to work on my projects, I think I am already doing enough for the 
community), so I thought that asking for a feature that does not break 
existing D 1.0 code and is in some way already implemented in D 2.0 
could be worth trying :)

The other D 2.0 features are nice, I do like structure 
constructors/destructors, post blits, template constraints, const..., 
well shared I am not so sure about, but anyway all of them are not 
needed for concurrency.

Now about the concurrency infrastructure, here I will discuss SMP 
parallelization, there is also a more coarse grained parallelization 
that needs another approach (MPI and agent based model, for now I just 
wrapped mpi and serialization in a way that could be implemented 
directly on tcp, and allow to easily communicate arbitrary objects).

The concurrency has two sides one is the user/programmer side and the 
other the efficient realization, I will discuss the user level api, as 
I realized it in Blip, which is optimized for recursive operations that 
have to be finished (i.e computations to be performed, not to simulate 
concurrent systems).
The idea is to separate each tasks in chunks that are as small as 
possible, while still being large enough so that the switching time is 
small wrt. to the computing time.
This subdivision typically is not directly dependent on the number of 
processors

----
Task is a class the represents a task, it has a string name (for 
debugging purposes) and can be initialized with a delegate, a function, 
a fiber or a generator (in two flavors).
There are some optimizations to make allocation cheaper.

a task can spawn subtasks, and it "knows" how many subtasks there are 
executing.

a task is considered finished when the task is complete and all its 
subtasks are completed

you can append operations to be executed after a task has finished executing.

you can wait for a task to finish (but try avoiding it, addint the task 
to the onFinish of the task is much more efficient).

a task has a given level, subtasks have level+1, and tasks that cannot 
have subtasks have a very high level (int.max/2).

a new task can be submitted with
t.submit()
or t.submitYield()
submitYield submits the current task and possibly stops the current 
one, this together with the fact that tasks with higher level are 
executed before tasks with lower level means that execution is 
preferentially a depth first reduction which minimizes the number of 
suspended tasks (I have also task stealing that steals preferentially 
the tasks with lowest level, but that part is not yet really used).

other important operations are delay and resubmitDelayed that allow one 
to delay a task, for example when you have to do i/o and you use a 
method like epoll you can delay the current task, add the file handler 
to the one to control, and execute resubmitDelayed when that handler 
has new data.

executeNow is useful to execute a task synchronously.

That is basically the core that one has to know, the library is 
actually much more rich, but for a typical usage this is enough.

With it one can write code like:

import blip.parallel.WorkManager;

class SubProblem{
  void execute(){
    if(problem is large){
      SubProblem firstHalf=...;
      Task("subproblem1",&firstHalf.execute).autorelease.submitYield();
      SubProblem secondHalf=...;
      Task("subproblem2",&secondHalf.execute).autorelease.submitYield();
    } else {
      direct solver
    }
  }
}

solveProblem(){
  if(problemIsLarge){
    SubProblem wholeProblem=...;
    Task("solveProblem",&wholeProblem.execute).executeNow(default);
   } else {
    direct solver
   }
}

just to show a very basic divide and conquer approach