Review of Andrei's std.benchmark
David Piepgrass
qwertie256 at gmail.com
Fri Sep 21 11:49:33 PDT 2012
> After extensive tests with a variety of aggregate functions, I
> can say firmly that taking the minimum time is by far the best
> when it comes to assessing the speed of a function.
Like others, I must also disagree in princple. The minimum sounds
like a useful metric for functions that (1) do the same amount of
work in every test and (2) are microbenchmarks, i.e. they measure
a small and simple task. If the benchmark being measured either
(1) varies the amount of work each time (e.g. according to some
approximation of real-world input, which obviously may vary)* or
(2) measures a large system, then the average and standard
deviation and even a histogram may be useful (or perhaps some
indicator whether the runtimes are consistent with a normal
distribution or not). If the running-time is long then the max
might be useful (because things like task-switching overhead
probably do not contribute that much to the total).
* I anticipate that you might respond "so, only test a single
input per benchmark", but if I've got 1000 inputs that I want to
try, I really don't want to write 1000 functions nor do I want
1000 lines of output from the benchmark. An average, standard
deviation, min and max may be all I need, and if I need more
detail, then I might break it up into 10 groups of 100 inputs. In
any case, the minimum runtime is not the desired output when the
input varies.
It's a little surprising to hear "The purpose of std.benchmark is
not to estimate real-world time. (That is the purpose of
profiling)"... Firstly, of COURSE I would want to estimate
real-world time with some of my benchmarks. For some benchmarks I
just want to know which of two or three approaches is faster, or
to get a coarse ball-park sense of performance, but for others I
really want to know the wall-clock time used for realistic inputs.
Secondly, what D profiler actually helps you answer the question
"where does the time go in the real-world?"? The D -profile
switch creates an instrumented executable, which in my experience
(admittedly not experience with DMD) severely distorts running
times. I usually prefer sampling-based profiling, where the
executable is left unchanged and a sampling program interrupts
the program at random and grabs the call stack, to avoid the
distortion effect of instrumentation. Of course, instrumentation
is useful to find out what functions are called the most and
whether call frequencies are in line with expectations, but I
wouldn't trust the time measurements that much.
As far as I know, D doesn't offer a sampling profiler, so one
might indeed use a benchmarking library as a (poor) substitute.
So I'd want to be able to set up some benchmarks that operate on
realistic data, with perhaps different data in different runs in
order to learn about how the speed varies with different inputs
(if it varies a lot then I might create more benchmarks to
investigate which inputs are processed quickly, and which slowly.)
Some random comments about std.benchmark based on its
documentation:
- It is very strange that the documentation of printBenchmarks
uses neither of the words "average" or "minimum", and doesn't say
how many trials are done.... I suppose the obvious interpretation
is that it only does one trial, but then we wouldn't be having
this discussion about averages and minimums right? Øivind says
tests are run 1000 times... but it needs to be configurable
per-test (my idea: support a _x1000 suffix in function names, or
_for1000ms to run the test for at least 1000 milliseconds; and
allow a multiplier when when running a group of benchmarks, e.g.
a multiplier argument of 0.5 means to only run half as many
trials as usual.) Also, it is not clear from the documentation
what the single parameter to each benchmark is (define
"iterations count".)
- The "benchmark_relative_" feature looks quite useful. I'm also
happy to see benchmarkSuspend() and benchmarkResume(), though
benchmarkSuspend() seems redundant in most cases: I'd like to
just call one function, say, benchmarkStart() to indicate "setup
complete, please start measuring time now."
- I'm glad that StopWatch can auto-start; but the documentation
should be clearer: does reset() stop the timer or just reset the
time to zero? does stop() followed by start() start from zero or
does it keep the time on the clock? I also think there should be
a method that returns the value of peek() and restarts the timer
at the same time (perhaps stop() and reset() should just return
peek()?)
- After reading the documentation of comparingBenchmark and
measureTime, I have almost no idea what they do.
More information about the Digitalmars-d
mailing list