avgtime - Small D util for your everyday benchmarking needs

Fri Mar 23 14:52:46 PDT 2012

On Friday, 23 March 2012 at 15:33:18 UTC, Andrei Alexandrescu 
wrote:
> On 3/23/12 3:02 AM, Juan Manuel Cabo wrote:
>> On Friday, 23 March 2012 at 05:16:20 UTC, Andrei Alexandrescu 
>> wrote:
>> [.....]
>>>> (man, the gaussian curve is everywhere, it never ceases to
>>>> perplex me).
>>>
>>> I'm actually surprised. I'm working on benchmarking lately 
>>> and the
>>> distributions I get are very concentrated around the minimum.
>>>
>>> Andrei
>>
>>
>> Well, the shape of the curve depends a lot on
>> how the random noise gets inside the measurement.
> [snip]
>
> Hmm, well the way I see it, the observed measurements have the 
> following composition:
>
> X = T + Q + N
>
> where T > 0 (a constant) is the "real" time taken by the 
> processing, Q > 0 is the quantization noise caused by the 
> limited resolution of the clock (can be considered 0 if the 
> resolution is much smaller than the actual time), and N is 
> noise caused by a variety of factors (other processes, 
> throttling, interrupts, networking, memory hierarchy effects, 
> and many more). The challenge is estimating T given a bunch of 
> X samples.
>
> N can be probably approximated to a Gaussian, although for 
> short timings I noticed it's more like bursts that just cause 
> outliers. But note that N is always positive (therefore not 
> 100% Gaussian), i.e. there's no way to insert some noise that 
> makes the code seem artificially faster. It's all additive.
>
> Taking the mode of the distribution will estimate T + mode(N), 
> which is informative because after all there's no way to 
> eliminate noise. However, if the focus is improving T, we want 
> an estimate as close to T as possible. In the limit, taking the 
> minimum over infinitely many measurements of X would yield T.
>
>
> Andrei

In general, I agree with your reasoning. And I appreciate you
taking the time to put it so eloquently!!

But I think that your considering T as a constant, and
preferring the minimum misses something. This might work
very well for benchmarking mostly CPU bound processes,
but all those other things that you consider noise
(disk I/O, network, memory hierarchy, etc.) are part
of the elements that make an algorithm or program faster
than other, and I would consider them inside T for
some applications.

Consider the case depicted in this wonderful (ranty) article
that was posted elsewhere in this thread:
     http://zedshaw.com/essays/programmer_stats.html
In a part of the article, the guy talks about a
system that worked fast most of the time, but would halt
for a good 1 or 2 minutes sometimes.

The minimum time for such a system might be a few ms, but
the standard deviation would be big. This properly shifts
the average time away from the minimum.

If programA does the same task than programB with less I/O,
or with better memory layout, etc. its average will be
better, and maybe its timings won't be so spread out. But
the minimum will be the same.

So, in the end, I'm just happy that I could share this
little avgtime with you all, and as usual there is
no one-answer fits all. For some applications, the
minimum will be enough. For others, it's esential to look
at how spread the sample is.

On the symmetry/asymmetry of the distribution topic:
I realize as you said that T never gets faster than
a certain point.
But, depending on the nature of the program under test,
the good utilization of disk I/O, network, memory,
motherboard buses, etc. is what you want inside the
test too, and those come with gaussian like noises
which might dominate over T or not.

A program that avoids that other big noise is a better
program (all else the same), so I would tend to consider
the whole.

Thanks for the eloquency/insightfulness in your post!
I'll consider adding chi-squared confidence intervals
in the future. (and open to more info or if another
distribution might be better).

--jm