Good demo for showing benefits of parallelism

Sun Jan 28 02:53:46 PST 2007

Mikola Lysenko wrote:
> There seems to be a great deal of confusion between concurrency and 
> parallelism.  Parallelism is a natural part of many problems, and it is 
> relatively easy to exploit in order to enhance performance.  Parallel 
> algorithms naturally scale to arbitrary numbers of processors, and are 
> not particularly difficult to develop.
> 
> Concurrency on the other hand is very difficult.  When multiple 
> processes must communicate, the programming complexity quickly spirals 
> out of control resulting in unmanageable chaotic programs.  Locks, 
> channels and STM are all useful concurrency primitives, but no single 
> one can be considered a complete solution.  The difficulty of concurrent 
> programming was recognized early on by programmers like Dijkstra who 
> worked on the first operating systems.  To this day, it is still an 
> unsolved problem and must be approached very carefully on a per-case basis.
> 
> In this light, GPU programming should not be considered concurrent 
> programming, since it is impossible for threads on the GPU to 
> communicate since all shader memory is read-only.  GPU programs are 
> parallel however, and they are typically not very difficult to write 
> (beyond some annoyances in the API/shader language).  Similarly futures 
> do not help with concurrent programs, since they only improve the 
> parallelism inherent within a program.
> 
> Shaders, futures and array operations are all helpful, since they 
> provide convenient mechanisms for utilizing parallelism.  However they 
> utterly fail to address the most difficult aspects of multi-threading, 
> which means they are not a complete solution.

I like your description -- I've probably been switching these terms 
myself.  I also agree that futures are not a complete solution, however, 
I do think they are a useful tool.  I think it depends on the reason for 
the concurrency in the first place.

Why not write an algorithm sequentially instead of using multiple threads?

1. There is a lot of work to do and multiple CPUs to do it.

2. The process alternates between periods of heavy CPU / low IO usage, 
and low CPU / heavy I/O usage.  Often one thread can use the CPU while 
the other waits for I/O.

3. Each thread represents interaction with another 'active entity', such 
as a socket connection to another computer, a GUI connection to an X 
server, an input device or piece of hardware.

4. Other cases such as (if I understand this correctly) O/S threads 
where the O/S thread corresponds to a user thread, and runs when the 
user thread is in a syscall.

Items 1 and 2 can be done easily via futures -- they are strictly 
concerned with parallelism.

Item 4 probably needs a thread for architectural reasons (except in a 
message passing O/S design?  I'm not sure about this one...)

Item 3:  I think there is more flexibility here than is normally 
exploited;  A design which currently uses the "one-thread-per-entity" 
philosophy can often be redesigned as a message passing algorithm or 
something like it.  The select call helps for file-like datasets.

Then the question comes: why (and if) message passing / futures are 
better than Thread and Mutex.  Herb Sutter argues that it is hard to 
design correct code using locks and primitives like sleep/pause/mutex, 
and that it gets a lot harder with larger systems.

(As I understand it...) Herb's argument is that if I have several 
modules that use Locks correctly, they often won't when combined. 
Deadlock (and livelock) avoidance require knowledge of the locking rules 
for the entire system.  Without such knowledge, it is difficult to do 
things like lock ordering that prevent deadlocks.  Other techniques are 
available (deadlock detection and rollback), but these can have their 
own thorny failure states.

In a design based on futures, I can reason about correctness much more 
easily because the design can be made sequential trivially -- just don't 
compute the result until the point where the value is accessed.

If the sequential program is correct, the concurrent version is probably 
correct.  There are fewer opportunities for breaking it, because the 
code does synchronization in a few places instead of everywhere.  (Of 
course there is synchronization and concurrency management in the Future 
implementation, but its pretty well constrained.)

I agree completely with your premise that concurrency is fundamentally 
hard.  So the goal (as I see it today) is to take as much of the 
concurrency as possible *out* of the algorithm, and still leverage 
multiple CPUs and solve the I/O vs. CPU problem I label as #2 above.

Kevin