Good demo for showing benefits of parallelism
Kevin Bealer
kevinbealer at gmail.com
Sun Jan 28 02:53:46 PST 2007
Mikola Lysenko wrote:
> There seems to be a great deal of confusion between concurrency and
> parallelism. Parallelism is a natural part of many problems, and it is
> relatively easy to exploit in order to enhance performance. Parallel
> algorithms naturally scale to arbitrary numbers of processors, and are
> not particularly difficult to develop.
>
> Concurrency on the other hand is very difficult. When multiple
> processes must communicate, the programming complexity quickly spirals
> out of control resulting in unmanageable chaotic programs. Locks,
> channels and STM are all useful concurrency primitives, but no single
> one can be considered a complete solution. The difficulty of concurrent
> programming was recognized early on by programmers like Dijkstra who
> worked on the first operating systems. To this day, it is still an
> unsolved problem and must be approached very carefully on a per-case basis.
>
> In this light, GPU programming should not be considered concurrent
> programming, since it is impossible for threads on the GPU to
> communicate since all shader memory is read-only. GPU programs are
> parallel however, and they are typically not very difficult to write
> (beyond some annoyances in the API/shader language). Similarly futures
> do not help with concurrent programs, since they only improve the
> parallelism inherent within a program.
>
> Shaders, futures and array operations are all helpful, since they
> provide convenient mechanisms for utilizing parallelism. However they
> utterly fail to address the most difficult aspects of multi-threading,
> which means they are not a complete solution.
I like your description -- I've probably been switching these terms
myself. I also agree that futures are not a complete solution, however,
I do think they are a useful tool. I think it depends on the reason for
the concurrency in the first place.
Why not write an algorithm sequentially instead of using multiple threads?
1. There is a lot of work to do and multiple CPUs to do it.
2. The process alternates between periods of heavy CPU / low IO usage,
and low CPU / heavy I/O usage. Often one thread can use the CPU while
the other waits for I/O.
3. Each thread represents interaction with another 'active entity', such
as a socket connection to another computer, a GUI connection to an X
server, an input device or piece of hardware.
4. Other cases such as (if I understand this correctly) O/S threads
where the O/S thread corresponds to a user thread, and runs when the
user thread is in a syscall.
Items 1 and 2 can be done easily via futures -- they are strictly
concerned with parallelism.
Item 4 probably needs a thread for architectural reasons (except in a
message passing O/S design? I'm not sure about this one...)
Item 3: I think there is more flexibility here than is normally
exploited; A design which currently uses the "one-thread-per-entity"
philosophy can often be redesigned as a message passing algorithm or
something like it. The select call helps for file-like datasets.
Then the question comes: why (and if) message passing / futures are
better than Thread and Mutex. Herb Sutter argues that it is hard to
design correct code using locks and primitives like sleep/pause/mutex,
and that it gets a lot harder with larger systems.
(As I understand it...) Herb's argument is that if I have several
modules that use Locks correctly, they often won't when combined.
Deadlock (and livelock) avoidance require knowledge of the locking rules
for the entire system. Without such knowledge, it is difficult to do
things like lock ordering that prevent deadlocks. Other techniques are
available (deadlock detection and rollback), but these can have their
own thorny failure states.
In a design based on futures, I can reason about correctness much more
easily because the design can be made sequential trivially -- just don't
compute the result until the point where the value is accessed.
If the sequential program is correct, the concurrent version is probably
correct. There are fewer opportunities for breaking it, because the
code does synchronization in a few places instead of everywhere. (Of
course there is synchronization and concurrency management in the Future
implementation, but its pretty well constrained.)
I agree completely with your premise that concurrency is fundamentally
hard. So the goal (as I see it today) is to take as much of the
concurrency as possible *out* of the algorithm, and still leverage
multiple CPUs and solve the I/O vs. CPU problem I label as #2 above.
Kevin
More information about the Digitalmars-d
mailing list