Reduce has dreadful performance?

Thu Jun 18 03:46:16 PDT 2015

On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
> On a given machine, the code:
>
> double sequential_loop(const int n, const double delta) {
>   auto sum = 0.0;
>   foreach (immutable i; 1 .. n + 1) {
>     immutable x = (i - 0.5) * delta;
>     sum += 1.0 / (1.0 + x * x);
>   }
>   return 4.0 * delta * sum;
> }
>
> runs in about 6.70s. However the code:
>
> double sequential_reduce(const int n, const double delta) {
>   return 4.0 * delta * reduce!((double t, int i){immutable x = 
> (i -
> 0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n 
> + 1));
> }
>
> runs in about 17.03s, whilst:
>
> double sequential_reduce_alt(const int n, const double delta) {
>   return 4.0 * delta * reduce!"a + b"(
>          map!((int i){immutable x = (i - 0.5) * delta; return 
> 1.0 /
> (1.0 + x * x);})(iota(1, n + 1)));
> }
>
> takes about 28.02s. Unless I am missing something (very 
> possible), this
> is not going to create a good advert for D as an imperative 
> language
> with declarative (internal iteration) expression.

import std.stdio, std.datetime, std.algorithm, std.range, 
std.conv;

double sequential_loop(const int n, const double delta) {
   auto sum = 0.0;
   foreach (immutable i; 1 .. n + 1) {
     immutable x = (i - 0.5) * delta;
     sum += 1.0 / (1.0 + x * x);
   }
   return 4.0 * delta * sum;
}

//runs in about 6.70s. However the code:

double sequential_reduce(const int n, const double delta) {
   return 4.0 * delta * reduce!((double t, int i){immutable x = (i 
-
0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 
1));
}

//runs in about 17.03s, whilst:

double sequential_reduce_alt(const int n, const double delta) {
   return 4.0 * delta * reduce!"a + b"(
          map!((int i){immutable x = (i - 0.5) * delta; return 1.0 
/
(1.0 + x * x);})(iota(1, n + 1)));
}

void main() {
	auto res = benchmark!(
		{sequential_loop(1000, 10);},
		{sequential_reduce(1000, 10);},
		{sequential_reduce_alt(1000, 10);},
		)(1000);
	writeln(res[].map!(to!Duration));
}

$ dmd -run test.d
[9 ms, 305 μs, and 9 hnsecs, 16 ms, 625 μs, and 3 hnsecs, 27 ms, 
417 μs, and 3 hnsecs]

$ dmd -O -inline -release -run test.d
[4 ms, 567 μs, and 6 hnsecs, 4 ms, 853 μs, and 8 hnsecs, 5 ms, 52 
μs, and 5 hnsecs]

OS X, DMD64 D Compiler v2.067.1

Looks like you forgot optimisation troika "-O -inline -release"

Regards,
Ilya