Thoughts on parallel programming?

Thu Nov 11 06:16:20 PST 2010

On 11-nov-10, at 09:58, Russel Winder wrote:

> On Thu, 2010-11-11 at 02:24 +0000, jfd wrote:
>> Any thoughts on parallel programming.  I was looking at something  
>> about Chapel
>> and X10 languages etc. for parallelism, and it looks interesting.   
>> I know that
>> it is still an area of active research, and it is not yet (far  
>> from?) done,
>> but anyone have thoughts on this as future direction?  Thank you.
>
> Any programming language that cannot be used to program applications
> running on a heterogeneous collection of processors, including CPUs  
> and
> GPUs as computational devices, on a single chip, with there being many
> such chips on a board, possibly clustered, doesn't have much of a
> future.  Timescale 5--10 years.

on this I am not so sure, heterogeneous clusters are more difficult to  
program, and GPU & co are slowly becoming more and more general purpose.
Being able to take advantage of those is useful, but I am not  
convinced they are necessarily the future.

> Intel's 80-core, 48-core and 50-core devices show the way server,
> workstation and laptop architectures are going.  There may be a large
> central memory unit as now, but it will be secondary storage not  
> primary
> storage.  All the chip architectures are shifting to distributed  
> memory
> -- basically cache coherence is too hard a problem to solve, so  
> instead
> of solving it, they are getting rid of it.  Also the memory bus stops
> being the bottleneck for computations, which is actually the biggest
> problem with current architectures.

yes many core is the future I agree on this, and also that distributed  
approach is the only way to scale to a really large number of  
processors.
Bud distributed systems *are* more complex, so I think that for the  
foreseeable future one will have a hybrid approach.

> Windows, Linux and Mac OS X have a serious problem and will either die
> or be revolutionized.  Apple at least recognize the issue, hence they
> pushed OpenCL.

again not sure the situation is as dire as you paint it, Linux does  
quite well in the HPC field... but I agree that to be the ideal OS for  
these architectures it will need more changes.

> Actor model, CSP, dataflow, and similar distributed memory/process- 
> based
> architectures will become increasingly important for software.  There
> will be an increasing move to declarative expression, but I doubt
> functional languages will ever make the main stream.  The issue here  
> is
> that parallelism generally requires programmers not to try and tell  
> the
> computer every detail how to do something, but instead specify the  
> start
> and end conditions and allow the runtime system to handle the
> realization of the transformation.  Hence the move in Fortran from  
> lots
> of "do" loops to "whole array" operations.

Whole array operation are useful, and when possible one gains much  
using them, unfortunately not all problems can be reduced to few large  
array operations, data parallel languages are not the main type of  
language for these reasons.

> MPI and all the SPMD approaches have a severely limited future, but I
> bet the HPC codes are still using Fortran and MPI in 50 years time.

well whole array operations are a generalization of the SPMD approach,  
so I this sense you said that that kind of approach will have a future  
(but with a more difficult optimization as the hardware is more complex.

About MPI I think that many don't see what MPI really does, mpi offers  
a simplified parallel model.
The main weakness of this model is that it assumes some kind of  
reliability, but then it offers
a clear computational model with processors ordered in a linear of  
higher dimensional structure and efficient collective communication  
primitives.
Yes MPI is not the right choice for all problems, but when usable it  
is very powerful, often superior to the alternatives, and programming  
with it is *simpler* than thinking about a generic distributed system.
So I think that for problems that are not trivially parallel, or  
easily parallelizable MPI will remain as the best choice.

> You mentioned Chapel and X10, but don't forget the other one of the
> original three HPCS projects, Fortress.  Whilst all three are PGAS
> (partitioned global address space) languages, Fortress takes a very
> different viewpoint compared to Chapel and X10.

It might be a personal thing, but I am kind of "suspicious" toward  
PGAS, I find a generalized MPI model better than PGAS when you want to  
have separated address spaces.
Using MPI one can define a PGAS like object wrapping local storage  
with an object that sends remote requests to access remote memory  
pieces.
This means having a local server where this wrapped objects can be  
"published" and that can respond in any moment to external requests. I  
call this rpc (remote procedure call) and it can be realized easily on  
the top of MPI.
As not all objects are distributed and in a complex program it does  
not always makes sense to distribute these objects on all processors  
or none, I find that the robust partitioning and collective  
communication primitives of MPI superior to PGAS.
With enough effort you probably can get everything also from PGAS, but  
then you loose all its simplicity.

> The summary of the summary is:  programmers will either be developing
> parallelism systems or they will be unemployed.

The situation is not so dire, some problems are trivially parallel, or  
can be solved with simple parallel patterns, others don't need to be  
solved in parallel, as the sequential solution if fast enough, but I  
do agree that being able to develop parallel systems is increasingly  
important.
In fact it is something that I like to do, and I thought about a lot.
I did program parallel systems, and out of my experience I tried to  
build something to do parallel programs "the way it  should be", or at  
least the way I would like it to be ;)

The result is what I did with blip, http://dsource.org/projects/blip .
I don't think that (excluding some simple examples) fully automatic  
(trasparent) parallelization is really feasible.
At some point being parallel is more complex, and it puts an extra  
burden on the programmer.
Still it is possible to have several levels of parallelization, and if  
you program a fully parallel program it should still be possible to  
use it relatively efficiently locally, but a local program will not  
automatically become fully parallel.

What I did is a basic smp parallelization for programs with shared  
memory.
This level tries to schedule efficiently independent recursive tasks  
using all processors as efficiently as possible (using the topology  
detected by libhwloc.
It leverages an event based framework (libev) to avoid blocking  
waiting for external tasks.
The ability to describe complex asynchronous processes can be very  
useful also to work with GPUs.

mpi parallelization is part of the hierarchy of parallelization, for  
the reasons I described before, it is wrapped so that on a single  
processor one can use a "pseudo" mpi.

rpc (remote procedure call) might be better described as distributed  
objects, offers a server that can responds to external requests at any  
moment and the possibility to publish objects that will be then  
identified by urls.
There urls can be used to create local proxies that call the remote  
object and get results from it.
This can be done using mpi, or directly sockets.
If one uses sockets he has the whole flexibility (but also the whole  
complexity) of a fully distributed system.
The basic building blocks of this can be used also in a distributed  
protocol like distributed hashtables.

blip is available now, and works with osx and linux. It should be  
possible to port it to windows, (both libhwloc and libev work on  
windows), but I didn't do it.
It needs D1 and tango, tango trunk can be compiled using the scripts  
in blip/buildTango, and then programs using blip can be compiled more  
easily with the dbuild script (that uses xfbuild behind the scenes).

I planned to make an official release this w.e., but you can look  
already now, the code is all there...

Fawzi

-----------------------------------------------------
Dr. Fawzi Mohamed,                      Office: 3'322
Humboldt-Universitaet zu Berlin, Institut fuer Chemie
Post:               Unter den Linden 6, 10099, Berlin
Besucher/Pakete:    Brook-Taylor-Str. 2, 12489 Berlin
Tel: +49 30 20 93 7140          Fax: +49 30 2093 7136
-----------------------------------------------------