Performance

Sat May 31 10:44:23 PDT 2014

On Saturday, 31 May 2014 at 05:12:54 UTC, Marco Leise wrote:
> Run this with: -O3 -frelease -fno-assert -fno-bounds-check 
> -march=native
> This way GCC and LLVM will recognize that you alternately add
> p0 and p1 to the sum and partially unroll the loop, thereby
> removing the condition. It takes 1.4xxxx nanoseconds per step
> on my not so new 2.0 Ghz notebook, so I assume your PC will
> easily reach parity with your original C++ version.
>
>
>
> import std.stdio;
> import core.time;
>
> alias ℕ = size_t;
>
> void main()
> {
> 	run!plus(1_000_000_000);
> }
>
> double plus(ℕ steps)
> {
> 	enum p0 = 0.0045;
> 	enum p1 = 1.00045452 - p0;
>
> 	double sum = 1.346346;
> 	foreach (i; 0 .. steps)
> 		sum += i%2 ? p1 : p0;
> 	return sum;
> }
>
> void run(alias func)(ℕ steps)
> {
> 	auto t1 = TickDuration.currSystemTick;
> 	auto output = func(steps);
> 	auto t2 = TickDuration.currSystemTick;
> 	auto nanotime = 1_000_000_000.0 / steps * (t2 - t1).length / 
> TickDuration.ticksPerSec;
> 	writefln("Last: %s", output);
> 	writefln("Time per op: %s", nanotime);
> 	writeln();
> }

Thank you for the help. Which OS is running on your notebook ? 
For I compiled your source code with your settings with the GCC 
compiler. The run took 3.1xxxx nanoseconds per step. For the DMD 
compiler the run took 5.xxxx nanoseconds. So I think the problem 
could be specific to the linux versions of the GCC and the DMD 
compilers.

Thomas