Thoughts on parallel programming?

Fawzi Mohamed fawzi at gmx.ch
Fri Nov 12 03:48:58 PST 2010


On 11-nov-10, at 20:41, Russel Winder wrote:

> On Thu, 2010-11-11 at 15:16 +0100, Fawzi Mohamed wrote:
> [ . . . ]
>> on this I am not so sure, heterogeneous clusters are more difficult  
>> to
>> program, and GPU & co are slowly becoming more and more general  
>> purpose.
>> Being able to take advantage of those is useful, but I am not
>> convinced they are necessarily the future.
>
> The Intel roadmap is for processor chips that have a number of cores
> with different architectures.  Heterogeneity is not going going to  
> be a
> choice, it is going to be an imposition.  And this is at bus level,  
> not
> at cluster level.

Vector co processors, yes I see that, and short term the effect of  
things like AMD fusion (CPU/GPU merging).
Is this necessarily the future? I don't know, neither does intel I  
think, as they are still evaluating larabee.
But CPU/GPU will stay around fro some time more for sure.

> [ . . . ]
>> yes many core is the future I agree on this, and also that  
>> distributed
>> approach is the only way to scale to a really large number of
>> processors.
>> Bud distributed systems *are* more complex, so I think that for the
>> foreseeable future one will have a hybrid approach.
>
> Hybrid is what I am saying is the future whether we like it or not.   
> SMP
> as the whole system is the past.

>
> I disagree that distributed systems are more complex per se.  I  
> suspect
> comments are getting so general here that anything anyone writes can  
> be
> seen as both true and false simultaneously.  My perception is that
> shared memory multithreading is less and less a tool that applications
> programmers should be thinking in terms of.  Multiple processes with  
> an
> hierarchy of communications costs is the overarching architecture with
> each process potentially being SMP or CSP or . . .

I agree that on not too large shared memory machines a hierarchy of  
tasks is the correct approach.
This is what I did in blip.parallel.smp. Using that one can have  
fairly efficient automatic scheduling, and so forget most of the  
complexities, and actual hardware configuration.

>> again not sure the situation is as dire as you paint it, Linux does
>> quite well in the HPC field... but I agree that to be the ideal OS  
>> for
>> these architectures it will need more changes.
>
> The Linux driver architecture is already creaking at the seams, it
> implies a central monolithic approach to operating system.  This falls
> down in a multiprocessor shared memory context.  The fact that the Top
> 500 generally use Linux is because it is the least worst option.  M$
> despite throwing large amounts of money at the problem, and indeed
> bought some very high profile names to try and do something about the
> lack of traction, have failed to make any headway in the HPC operating
> system stakes.  Do you want to have to run a virus checker on your HPC
> system?
>
> My gut reaction is that we are going to see a rise of hypervisors as  
> per
> Tilera chips, at least in the short to medium term, simply as a bridge
> from the now OSes to the future.  My guess is that L4 microkernels
> and/or nanokernels, exokernels, etc. will find a central place in  
> future
> systems.  The problem to be solved is ensuring that the appropriate  
> ABI
> is available on the appropriate core at the appropriate time.   
> Mobility
> of ABI is the critical factor here.

yes microkernels& co will be more and more important (but I wonder how  
much this will be the case for the desktop).
ABI mobility?not so sure, for hpc I can imagine having to compile to  
different ABIs (but maybe that is what you mean with ABI mobility)

> [ . . . ]
>> Whole array operation are useful, and when possible one gains much
>> using them, unfortunately not all problems can be reduced to few  
>> large
>> array operations, data parallel languages are not the main type of
>> language for these reasons.
>
> Agreed.  My point was that in 1960s code people explicitly handled  
> array
> operations using do loops because they had to.  Nowadays such code is
> anathema to efficient execution.  My complaint here is that people  
> have
> put effort into compiler technology instead of rewriting the codes  
> in a
> better language and/or idiom.  Clearly whole array operations only  
> apply
> to algorithms that involve arrays!
>
> [ . . . ]
>> well whole array operations are a generalization of the SPMD  
>> approach,
>> so I this sense you said that that kind of approach will have a  
>> future
>> (but with a more difficult optimization as the hardware is more  
>> complex.
>
> I guess this is where the PGAS people are challenging things.
> Applications can be couched in terms of array algorithms which can be
> scattered across distributed memory systems.  Inappropriate operations
> lead to huge inefficiencies, but handles correctly, code runs very
> fast.
>
>> About MPI I think that many don't see what MPI really does, mpi  
>> offers
>> a simplified parallel model.
>> The main weakness of this model is that it assumes some kind of
>> reliability, but then it offers
>> a clear computational model with processors ordered in a linear of
>> higher dimensional structure and efficient collective communication
>> primitives.
>> Yes MPI is not the right choice for all problems, but when usable it
>> is very powerful, often superior to the alternatives, and programming
>> with it is *simpler* than thinking about a generic distributed  
>> system.
>> So I think that for problems that are not trivially parallel, or
>> easily parallelizable MPI will remain as the best choice.
>
> I guess my main irritant with MPI is that I have to run the same
> executable on every node and, perhaps more importantly, the message
> passing structure is founded on Fortran primitive data types.  OK so  
> you
> can hack up some element of abstraction so as to send complex  
> messages,
> but it would be far better if the MPI standard provided better
> abstractions.

PGAS and MPI both have the same executable everywhere, but MPI is more  
flexible, with respect of making different part execute different  
things, and MPI does provide more generic packing/unpacking, but I  
guess I see you problems with it.
Having the same executable is a big constraint, but is also a  
simplification.

> [ . . . ]
>> It might be a personal thing, but I am kind of "suspicious" toward
>> PGAS, I find a generalized MPI model better than PGAS when you want  
>> to
>> have separated address spaces.
>> Using MPI one can define a PGAS like object wrapping local storage
>> with an object that sends remote requests to access remote memory
>> pieces.
>> This means having a local server where this wrapped objects can be
>> "published" and that can respond in any moment to external  
>> requests. I
>> call this rpc (remote procedure call) and it can be realized easily  
>> on
>> the top of MPI.
>> As not all objects are distributed and in a complex program it does
>> not always makes sense to distribute these objects on all processors
>> or none, I find that the robust partitioning and collective
>> communication primitives of MPI superior to PGAS.
>> With enough effort you probably can get everything also from PGAS,  
>> but
>> then you loose all its simplicity.
>
> I think we are going to have to take this one off the list.  My  
> summary
> is that MPI and PGAS solve different problems differently.  There are
> some problems that one can code up neatly in MPI and that are ugly in
> PGAS, but the converse is also true.
Yes I guess that is true

> [ . . . ]
>> The situation is not so dire, some problems are trivially parallel,  
>> or
>> can be solved with simple parallel patterns, others don't need to be
>> solved in parallel, as the sequential solution if fast enough, but I
>> do agree that being able to develop parallel systems is increasingly
>> important.
>> In fact it is something that I like to do, and I thought about a lot.
>> I did program parallel systems, and out of my experience I tried to
>> build something to do parallel programs "the way it  should be", or  
>> at
>> least the way I would like it to be ;)
>
> The real question is whether future computers will run Word,
> OpenOffice.org, Excel, Powerpoint fast enough so that people don't
> complain.  Everything else is an HPC ghetto :-)
>
>> The result is what I did with blip, http://dsource.org/projects/ 
>> blip .
>> I don't think that (excluding some simple examples) fully automatic
>> (trasparent) parallelization is really feasible.
>> At some point being parallel is more complex, and it puts an extra
>> burden on the programmer.
>> Still it is possible to have several levels of parallelization, and  
>> if
>> you program a fully parallel program it should still be possible to
>> use it relatively efficiently locally, but a local program will not
>> automatically become fully parallel.
>
> At the heart of all this is that programmers are taught that algorithm
> is a sequence of actions to achieve a goal.  Programmers are trained  
> to
> think sequentially and this affects their coding.  This means that
> parallelism has to be expressed at a sufficiently high level that
> programmers can still reason about algorithms as sequential things.


when you have a network of things communicating (I think that once you  
have a distributed system you come at that level) then i is not   
sufficient anymore to think about each piece in isolation, you have to  
think about the interactions too.
There are some patterns that might help reduce the complexity: client/ 
server, map/reduce,.... but in general it is more complex.
  


More information about the Digitalmars-d mailing list