Good demo for showing benefits of parallelism

Fri Jan 26 17:27:01 PST 2007

i had a simple, unoptimized raytracer from an old CG assignment lying
around that i ported to D and modified for parallel computation.
download here: http://mainia.de/prt.zip

on a single core/cpu machine, the time stays about the same until 4
threads, then the overhead kicks in.
here are some values from an opteron dual core system running linux
kernel 2.6 for different thread counts:
thrds seconds
1     32.123
2     32.182
3     29.329
4     28.556
8     21.661
16    20.186
24    20.423
32    21.410

these aren't quite what i expected. CPU usage shows that both cores get
about 55-80% load with 2 threads, stablilizing at 65-75% with 16
threads. with a single thread it's clearly 100% on one core.

am i missing something about the pthread lib or memory usage/sync that
prevents reasonable speedup? 160% is nice, but why does it take 16
threads to get there? and where exactly do the remaining 40% go?

Bill Baxter wrote:
> I don't remember where it was, but somebody in some thread said
> something like gee it would be great if we had a demo that showed the
> benefits of having parallel threads.  (It was probably in either the
> Futures lib thread or in the discussion about varargs_reduce).
> 
> Anyway, a raytracer is a perfect example of "embarrasingly
> parallelizeable" code.  In the simplest case you can state it simply as
> "for each pixel do trace_ray(ray_dir)".
> 
> Bradley Smith posted code for a little raytracer over on D.learn under
> the heading "Why is this code slower than C++".  If anyone is interested
> in turning this into a parallel raytracer that would be a nice little demo.
> 
> Along the same lines, almost any image processing algorithm is in the
> same category where the top loop looks like "foreach pixel do ___". This
> is a big part of why GPUs just keep getting faster and faster, because
> the basic problem is just so inherently parallelizable.
> 
> --bb