Cilk/Cilk++
Robert Jacques
sandford at jhu.edu
Thu Sep 4 22:05:32 PDT 2008
> As the leader of the Cilk project at MIT and founder of Cilk Arts, let
> me offer
> some comments about Cilk technology versus general futures.
Thank you. It's always great to hear from an expert.
> Cilk is probably more efficient than a general future mechanism. The
> overhead for
> spawning in MIT-Cilk is only about 3 times the cost of an ordinary C
> function
> call. The overhead in Cilk Arts's Cilk++ is about 4 times a function
> call.
And the overhead for a future is about 3-4 function calls as well (on
average, less for delegates or if you can scope it) and ~4-6 CASes.
1 call to the future creation function (with complier support, this can be
merged with copying the arguments onto the heap)
1 call worth of work to copy the function arguments onto the heap (not
needed for delegates or if you can scope the future.)
1 call worth of work to copy the function arguments onto the worker thread
and then begin execution.
The CASes are for a memory free-list and a work stack/queue.
> A general future mechanism runs into the problem that it can blow out
> memory
> pretty easily if you want good speedup. What happens is that you create
> a future
> (and allocate storage for it), and then it stalls. At that point, if
> you want
> good speedup, you have to switch to something else, which creates a
> future that
> stalls, and so forth. So, you end up with a lot of stalled futures
> sitting around
> and consuming memory. In contrast, a Cilk-style work-stealing scheduler
> guarantees that on P processors, it uses at most P times the stack space
> of a
> serial execution, and it provably achieves linear speedup as long as the
> application has sufficient parallelism.
This isn't a problem of futures. This is a problem of the implementation
of the future. It's no more difficult for futures to cleanly support
kernel locks than Cilk. Work-stealing actually is due to Click using per
thread work queues, instead of a single global queue. While the the
multiple queues do reduce contention, they require all threads to be
future/Cilk enabled. A hybrid solution, with a global queue for
non-future/Cilk threads and per thread queues for the future threads might
work well and support multi-paradigm threading.
> For more information on Cilk and Cilk++, please check out our website
> www.cilk.com. We also have an e-book at www.cilk.com/multicore-e-book
> that
> outlines some of the key issues in multicore-enabling software. If you
> have a
> deep interest, I'd be happy to discuss what would be involved in
> inserting Cilk
> technology into D.
More information about the Digitalmars-d
mailing list