Good demo for showing benefits of parallelism
Kevin Bealer
kevinbealer at gmail.com
Sat Jan 27 01:48:58 PST 2007
Jascha Wetzel wrote:
> Kevin Bealer wrote:
>> Do you know if there was much lock contention?
>
> there is no explicit synchronization at all. the nice thing about
> raytracing and similar algorithms is, that you can use disjoint memory
> and let the threads work isolated from each other. except for the main
> thread that waits for the workers to terminate, there are no muteces
> involved.
Hmmm... if the chunks you use are too small or too large, it can be an
issue too. If you divide the workload into small chunks like individual
pixels, it costs overhead due to switching and loss of locality of
reference.
On the other hand, if you have ten threads, I think it is a mistake to
divide the pixel load into exactly ten parts -- some pixels are heavier
than others. The pixels that raytrace complex refractions will take
longer and so some threads finish first.
What I would tend to do is divide the pixel count into blocks, then let
each thread pull a block of (consecutive or adjacent if possible) pixels
to work on. Then some threads can do hard blocks and take a while at
it, and others can do simple blocks and process more of them.
The goal is for all the threads to finish at about the same time. If
they don't you end up with some threads waiting idly at the end.
This would require reintroducing a little synchronization. I might
divide an imagine into blocks that were individually each about 2% of
the total image. Let's say you have 4 hardware threads and each block
is 2% of the work load. Then the average inefficiency is maybe
something like 1/2 * .02 * 4 = 4 %. This is 1/2 of the threads being
idle as they finish at randomly uneven rates, times 2 % of the total
work, times 4 hardware threads (because it doesn't hurt much to have
idle *software* threads, only idle hardware ones.)
Other micro-considerations:
1. Handing out 'harder' sections first is a good idea if you have this
info, because these will "stick out" more if they are the only one
running at the end (they represent a larger percentage than their block
size.)
2. You can start by handing out large chunks and then switch to smaller
chunks when the work is mostly done. For example, for 4 hardware
threads, you could always hand out 1/16th of the remaining work until
you get down to handing out 100 pixels at a time.
(I don't know how useful this is for raytracing specifically, but some
of these issues come up where I work.)
Kevin
More information about the Digitalmars-d
mailing list