Review of Andrei's std.benchmark

Thu Sep 20 21:45:44 PDT 2012

On 9/20/12 11:03 PM, Nick Sabalausky wrote:
> On Tue, 18 Sep 2012 18:02:10 -0400
> Andrei Alexandrescu<SeeWebsiteForEmail at erdani.org>  wrote:
>
>> On 9/18/12 5:07 PM, "Øivind" wrote:
>>> * For all tests, the best run is selected, but would it not be
>>> reasonable in some cases to get the average value? Maybe excluding
>>> the runs that are more than a couple std. deviations away from the
>>> mean value..
>>
>> After extensive tests with a variety of aggregate functions, I can
>> say firmly that taking the minimum time is by far the best when it
>> comes to assessing the speed of a function.
>>
>
> *Ahem*: http://zedshaw.com/essays/programmer_stats.html

I'm not sure I figure how this applies to the discussion at hand.

> Your claim that the minimum time is sufficient is...ummm...extremely
> unorthodox, to say the least.

What would be the orthodoxy? If orthodoxy is what google finds, it's 
good we're not orthodox.

> As such, you're going to need a far more
> convincing argument than "It worked well for me."

Sure. I have just detailed the choices made by std.benchmark in a couple 
of posts.

At Facebook we measure using the minimum, and it's working for us. We've 
tried other approaches (such as taking the mode of the distribution). 
Turns out the minimum is better every time. Take a look at the early 
return in estimateTime():

https://github.com/facebook/folly/blob/master/folly/Benchmark.cpp#L136

> I assume I don't need to preach that "Extraordinary claims require
> extraordinary evidence". But, condensing benchmarks and statistics down
> to "take the minimum" and saying that's sufficient is one heck of an
> extraordinary claim. If you feel that you can provide sufficiently
> extraordinary justification, then please do.

My claim is unremarkable. All I'm saying is the minimum running time of 
an algorithm on a given input is a stable and indicative proxy for the 
behavior of the algorithm in general. So it's a good target for 
optimization.

There might be some confusion that std.benchmark does profiling. That's 
not part of its charter.

> Otherwise, I think we'll need richer results. At the very least there
> should be an easy way to get at the raw results programmatically
> so we can run whatever stats/plots/visualizations/output-formats we
> want. I didn't see anything like that browsing through the docs, but
> it's possible I may have missed it.

Currently std.benchmark does not expose raw results for the sake of 
simplicity. It's easy to expose such, but I'd need a bit more convincing 
about their utility.

> That brings up another question too: I like the idea of a
> one-stop-benchmarking-shop, much like we have for unittests, but maybe
> reporting shouldn't be so tightly integrated and left more open for
> integration with a proper statistics lib and more generalized
> output abilities? But of course, that doesn't preclude having a nice
> built-in, but optional, default report. (Again though, maybe I'm
> overlooking something already in the module?)

That's pretty much what's happening. There's an API for collecting 
timings, and then there's an API for printing those with a default format.

> One other nitpick: My initial impression is that the
> "benchmark_relative_file read" stuff seems a bit kludgey (and
> confusing to visually parse). Is there maybe a better way to handle
> that? For example, inspired by getopt:
>
> printBenchmarks!(
>      "file write", { std.file.write("/tmp/deleteme", "hello, world!"); },
>      BenchmarkOption.relative,
>      "file read", { std.file.read("/tmp/deleteme"); },
>      "array creation", { new char[32]; })
>      ();

The issue here is automating the benchmark of a module, which would 
require some naming convention anyway.

Andrei