std.benchmark ready for review. Manager sought after

Mon Apr 9 08:36:54 PDT 2012

On 4/9/12 9:25 AM, Manfred Nowak wrote:
> Andrei Alexandrescu wrote:
>> all noise is additive (there's no noise that may make a benchmark
>> appear to run faster)
>
> This is in doubt, because you yourself wrote "the machine itself has
> complex interactions". This complex interactions might lower the time
> needed for an operation of the benchmarked program.
>
> Examples that come to mind:
> a) needed data is already in a (faster) cache because it belongs to a
> memory block, from which some data is needed by some program not
> belonging to the benchmarked set---and that block isnt replaced yet.

Which is great, unless the program wants to measure the cache memory 
itself, in which case it would use special assembler instructions or 
large memset()s. (We do such at Facebook.)

> b) needed data is stored in a hdd whose I/O scheduler uses the elevator
> algorithm and serves the request by pure chance instantly, because the
> position of the needed data is between two positions accessed by some
> programs not belonging to the benchmarked set.
>
> Especially a hdd, if used, will be responsible for a lot of noise you
> define as "quantization noise (uniform distribution)" even if the head
> stays at the same cylinder. Not recognizing this noise would only mean
> that the data is cached and interpreting the only true read from the
> hdd as a jerky outlier sems quite wrong.

If the goal is to measure the seek time of the HDD, the benchmark itself 
should make sure the HDD cache is cleared. (What I recall they do on 
Linux is unmounting and remounting the drive.) Otherwise, it adds a 
useless component to the timing.

>>> 1) The "noise during normal use" has to be measured in order to
>>> detect the sensibility of the benchmarked program to that noise.
>> How do you measure it, and what
>> conclusions do you draw other than there's a more or less other
>> stuff going on on the machine, and the machine itself has complex
>> interactions?
>>
>> Far as I can tell a time measurement result is:
>>
>> T = A + Q + N
>
> For example by running more than one instance of the benchmarked
> program in paralell and use the thereby gathered statistical routines
> to spread T into the additiv components A, Q and N.

I disagree with running two benchmarks in parallel because that exposes 
them to even more noise (scheduling, CPU count, current machine load 
etc). I don't understand the part of the sentence starting with "...use 
the thereby...", I'd be grateful if you elaborated.

Andrei