D compiler benchmarks

Sun Mar 8 11:27:51 PDT 2009

On Mon, Mar 9, 2009 at 3:15 AM, Georg Wrede <georg.wrede at iki.fi> wrote:
> Robert Clipsham wrote:
>>
>> Georg Wrede wrote:
>>>
>>> Robert Clipsham wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I have set up some benchmarks for dmd, ldc and gdc at
>>>> http://dbench.octarineparrot.com/.
>>>>
>>>> There are currently only 6 tests all from
>>>> http://shootout.alioth.debian.org/gp4/d.php. My knowledge of phobos is not
>>>> great enough to port the others to tango (I've chosen tango as ldc does not
>>>> support phobos currently, so it make sense to choose tango as all compilers
>>>> support it). If you would like to contribute new tests or improve on the
>>>> current ones let me know and I'll include them next time I run them.
>>>>
>>>> All source code can be found at
>>>> http://hg.octarineparrot.com/dbench/file/tip.
>>>>
>>>> Let me know if you have any ideas for how I can improve the benchmarks,
>>>> I currently plan to add compile times, size of the final executable and
>>>> memory usage (if anyone knows an easy way to get the memory usage of a
>>>> process in D, let me know :D).
>>>
>>> The first run should not be included in the average.
>>
>> Could you explain your reasoning for this? I can't see why it shouldn't be
>> included personally.
>
> Suppose you have run the same program very recently before the test. Then
> the executable will be in memory already, any other files it may want to
> access are in memory too.
>
> This makes execution much faster than if it were the first time ever this
> program is run.
>
> If things were deterministic, then you wouldn't run several times and
> average the results, right?

Also I think standard practice for benchmarks is not to average but to
take the minimum time.
To the extent that things are not deterministic it is generally
because of factors outside of your program's control -- virtual memory
page fault kicking in, some other process stealing cycles, etc.  Or
put another way, there is no way for the measured run time of your
program to come out artificially too low, but there are lots of ways
it could come out too high.   The reason you average measurements in
other scenarios is because of an expectation that the measurements
form a normal distribution around the true value.  That is not the
case for measurements of computer program running times.  Measurements
will basically always be higher than the true intrinsic run-time for
your program.

--bb