The Computer Languages Shootout Game

Tue Nov 2 11:07:44 PDT 2010

> It doesn't matter whether they try to confirm the measurements for themselves or not - what matters is that they are provided with the all the information required to do so.
>
>
> I only have 5 years experience publishing the measurements for the benchmarks game - and I've come across a handful of people who did try to confirm the measurements for themselves.
>
> (The most interesting example compared a couple of language implementations on one particular task but measured at 2 dozen different input values. That nicely demonstrated that the same language implementation wasn't always faster across all the input values. The 3 different input values shown on the benchmarks game isn't usually enough to demonstrate that kind of thing.)
>

That's an interesting observation.  I didn't even think of that before, 
but it does make sense.

I was debating on posting this, but I figured it couldn't hurt: the 
biggest problem I have with the benchmarks they use is that, at least 
from my perspective, they're not all very common algorithms.  Some 
things I'd love to see are B-Trees, which are common in databases, 
encryption, compression, etc. as they are very common and therefore 
provide more useful comparisons.  Even MapReduce would be good since 
that's becoming very popular.

Taking it a step further, there needs to be well-defined standard 
implementations and alternative implementations.  The standard 
implementations would be designed to be straight-forward designs that 
don't use any trickery so that we can actually compare language 
implementations.  The alternative ones would then show how you can make 
the implementations faster.  I mention this because a buddy of mine 
submitted a C version of one benchmark, but implemented his own thread 
pooling code.  It was rejected even though the C++ version used Boost, 
which also, from what I'm told, uses thread pooling.  A standard 
implementation could be used to define if things like thread pooling 
can/should be used.  I'd argue not in this case as not every language 
supports and/or requires it.  E.g. Erlang.

Of course, this is all just some ideas that I'm not going to try to 
implement as it's just going to be too much work to do and I don't have 
the resources to do it right.  Even then, how do we make it truly fair 
and accurate?  Based on what I've seen in this thread, it's a pretty 
hard problem if even the data can affect a languages performance.

Casey