D outperformed by C++, what am I doing wrong?

Sun Aug 13 06:32:13 PDT 2017

On Sunday, 13 August 2017 at 09:56:44 UTC, Johan Engelen wrote:
> On Sunday, 13 August 2017 at 09:15:48 UTC, amfvcg wrote:
>>
>> Change the parameter for this array size to be taken from 
>> stdin and I assume that these optimizations will go away.
>
> This is paramount for all of the testing, examining, and 
> comparisons that are discussed in this thread.
> Full information is given to the compiler, and you are 
> basically testing the constant folding power of the compilers 
> (not unimportant).

I agree that in general this is not the right way to benchmark. I 
however am interested specifically in the pattern matching / 
constant folding abilities
of the compiler. I would have expected `sum(iota(1, N + 1))` to 
be replaced with `(N*(N+1))/2`. LDC already does this 
optimization in some cases. I have opened an issue for some of 
the rest: https://github.com/ldc-developers/ldc/issues/2271

> No runtime calculation is needed for the sum. Your program 
> could be optimized to the following code:
> ```
> void main()
> {
>     MonoTime beg = MonoTime.currTime;
>     MonoTime end = MonoTime.currTime;
>     writeln(end-beg);
>     writeln(50000000);
> }
> ```
> So actually you should be more surprised that the reported time 
> is not equal to near-zero (just the time between two 
> `MonoTime.currTime` calls)!

On Posix, `MonoTime.currTime`'s implementation uses 
clock_gettime(CLOCK_MONOTONIC, ...) which quite a bit more 
involved than simply using the rdtsc instruciton on x86. See: 
http://linuxmogeb.blogspot.bg/2013/10/how-does-clockgettime-work.html

On Windows, `MonoTime.currTime` uses QueryPerformanceCounter, 
which on Win 7 and later uses the rdtsc instruction, which makes 
it quite streamlined. In some testing I did several months ago 
QueryPerformanceCounter had really good latency and precision 
(though I forgot the exact numbers I got).

> Instead of `iota(1,1000000)`, you should initialize the array 
> with random numbers with a randomization seed given by the user 
> (e.g. commandline argument or stdin). Then, the program will 
> actually have to do the runtime calculations that I assume you 
> are expecting it to perform.
>

Agreed, though I think Phobos's unpredictableSeed does an ok job 
w.r.t. seeding, so unless you want to repeat the benchmark on the 
exact same dataset, something like this does a good job:

T[] generate(T)(size_t size)
{
     import std.algorithm.iteration : map;
     import std.range : array, iota;
     import std.random : uniform;

     return size.iota.map!(_ => uniform!T()).array;
}