D outperformed by C++, what am I doing wrong?
via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Sun Aug 13 06:32:13 PDT 2017
On Sunday, 13 August 2017 at 09:56:44 UTC, Johan Engelen wrote:
> On Sunday, 13 August 2017 at 09:15:48 UTC, amfvcg wrote:
>>
>> Change the parameter for this array size to be taken from
>> stdin and I assume that these optimizations will go away.
>
> This is paramount for all of the testing, examining, and
> comparisons that are discussed in this thread.
> Full information is given to the compiler, and you are
> basically testing the constant folding power of the compilers
> (not unimportant).
I agree that in general this is not the right way to benchmark. I
however am interested specifically in the pattern matching /
constant folding abilities
of the compiler. I would have expected `sum(iota(1, N + 1))` to
be replaced with `(N*(N+1))/2`. LDC already does this
optimization in some cases. I have opened an issue for some of
the rest: https://github.com/ldc-developers/ldc/issues/2271
> No runtime calculation is needed for the sum. Your program
> could be optimized to the following code:
> ```
> void main()
> {
> MonoTime beg = MonoTime.currTime;
> MonoTime end = MonoTime.currTime;
> writeln(end-beg);
> writeln(50000000);
> }
> ```
> So actually you should be more surprised that the reported time
> is not equal to near-zero (just the time between two
> `MonoTime.currTime` calls)!
On Posix, `MonoTime.currTime`'s implementation uses
clock_gettime(CLOCK_MONOTONIC, ...) which quite a bit more
involved than simply using the rdtsc instruciton on x86. See:
http://linuxmogeb.blogspot.bg/2013/10/how-does-clockgettime-work.html
On Windows, `MonoTime.currTime` uses QueryPerformanceCounter,
which on Win 7 and later uses the rdtsc instruction, which makes
it quite streamlined. In some testing I did several months ago
QueryPerformanceCounter had really good latency and precision
(though I forgot the exact numbers I got).
> Instead of `iota(1,1000000)`, you should initialize the array
> with random numbers with a randomization seed given by the user
> (e.g. commandline argument or stdin). Then, the program will
> actually have to do the runtime calculations that I assume you
> are expecting it to perform.
>
Agreed, though I think Phobos's unpredictableSeed does an ok job
w.r.t. seeding, so unless you want to repeat the benchmark on the
exact same dataset, something like this does a good job:
T[] generate(T)(size_t size)
{
import std.algorithm.iteration : map;
import std.range : array, iota;
import std.random : uniform;
return size.iota.map!(_ => uniform!T()).array;
}
More information about the Digitalmars-d-learn
mailing list