Review of Andrei's std.benchmark

Fri Sep 21 13:15:35 PDT 2012

On 21-Sep-12 22:49, David Piepgrass wrote:
>> After extensive tests with a variety of aggregate functions, I can say
>> firmly that taking the minimum time is by far the best when it comes
>> to assessing the speed of a function.

>
> As far as I know, D doesn't offer a sampling profiler, so one might
> indeed use a benchmarking library as a (poor) substitute. So I'd want to
> be able to set up some benchmarks that operate on realistic data, with
> perhaps different data in different runs in order to learn about how the
> speed varies with different inputs (if it varies a lot then I might
> create more benchmarks to investigate which inputs are processed
> quickly, and which slowly.)

Real good profilers are the ones served by CPU vendor. See AMD's 
CodeAnalyst or Intel's VTune. They could even count number of branch 
predictions, cache misses etc.
It is certainly out of the charter of module or for that matter any 
standard library code.

>
> Some random comments about std.benchmark based on its documentation:
>
> - It is very strange that the documentation of printBenchmarks uses
> neither of the words "average" or "minimum", and doesn't say how many
> trials are done.... I suppose the obvious interpretation is that it only
> does one trial, but then we wouldn't be having this discussion about
> averages and minimums right?

See the algorithm in action here:
https://github.com/D-Programming-Language/phobos/pull/794/files#L2R381

In other word a function is run 10^n times with n is picked so that 
total time is big enough to be a trustworthy measurement. Then run-time 
is time/10^n.

Øivind says tests are run 1000 times...

The above 1000 times, picking the minimum as the best. Obviously it'd be 
good to be configurable.

  but
> it needs to be configurable per-test (my idea: support a _x1000 suffix
> in function names, or _for1000ms to run the test for at least 1000
> milliseconds; and allow a multiplier when when running a group of
> benchmarks, e.g. a multiplier argument of 0.5 means to only run half as
> many trials as usual.) Also, it is not clear from the documentation what
> the single parameter to each benchmark is (define "iterations count".)
>

> - The "benchmark_relative_" feature looks quite useful. I'm also happy
> to see benchmarkSuspend() and benchmarkResume(), though
> benchmarkSuspend() seems redundant in most cases: I'd like to just call
> one function, say, benchmarkStart() to indicate "setup complete, please
> start measuring time now."
>
> - I'm glad that StopWatch can auto-start; but the documentation should
> be clearer: does reset() stop the timer or just reset the time to zero?
> does stop() followed by start() start from zero or does it keep the time
> on the clock? I also think there should be a method that returns the
> value of peek() and restarts the timer at the same time (perhaps stop()
> and reset() should just return peek()?)

It's the same as the usual stopwatch (as in the real hardware thingy). Thus:
- reset just resets numbers to zeros
- stop just stops counting
- start just starts counting
- peek imitates taking a look at numbers on a device ;)

>
> - After reading the documentation of comparingBenchmark and measureTime,
> I have almost no idea what they do.

I think that comparingBenchmark was present in std.datetime and is 
carried over as is.

-- 
Dmitry Olshansky