Cilk/Cilk++

Thu Sep 4 22:05:32 PDT 2008

> As the leader of the Cilk project at MIT and founder of Cilk Arts, let  
> me offer
> some comments about Cilk technology versus general futures.

Thank you. It's always great to hear from an expert.

> Cilk is probably more efficient than a general future mechanism.  The  
> overhead for
> spawning in MIT-Cilk is only about 3 times the cost of an ordinary C  
> function
> call.  The overhead in Cilk Arts's Cilk++ is about 4 times a function  
> call.

And the overhead for a future is about 3-4 function calls as well (on  
average, less for delegates or if you can scope it) and ~4-6 CASes.
1 call to the future creation function (with complier support, this can be  
merged with copying the arguments onto the heap)
1 call worth of work to copy the function arguments onto the heap (not  
needed for delegates or if you can scope the future.)
1 call worth of work to copy the function arguments onto the worker thread  
and then begin execution.
The CASes are for a memory free-list and a work stack/queue.

> A general future mechanism runs into the problem that it can blow out  
> memory
> pretty easily if you want good speedup.  What happens is that you create  
> a future
> (and allocate storage for it), and then it stalls.  At that point, if  
> you want
> good speedup, you have to switch to something else, which creates a  
> future that
> stalls, and so forth.  So, you end up with a lot of stalled futures  
> sitting around
> and consuming memory.  In contrast, a Cilk-style work-stealing scheduler
> guarantees that on P processors, it uses at most P times the stack space  
> of a
> serial execution, and it provably achieves linear speedup as long as the
> application has sufficient parallelism.

This isn't a problem of futures. This is a problem of the implementation  
of the future. It's no more difficult for futures to cleanly support  
kernel locks than Cilk. Work-stealing actually is due to Click using per  
thread work queues, instead of a single global queue. While the the  
multiple queues do reduce contention, they require all threads to be  
future/Cilk enabled. A hybrid solution, with a global queue for  
non-future/Cilk threads and per thread queues for the future threads might  
work well and support multi-paradigm threading.

> For more information on Cilk and Cilk++, please check out our website
> www.cilk.com.  We also have an e-book at www.cilk.com/multicore-e-book  
> that
> outlines some of the key issues in multicore-enabling software.  If you  
> have a
> deep interest, I'd be happy to discuss what would be involved in  
> inserting Cilk
> technology into D.