Simple performance question from a newcomer
dextorious via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Sun Feb 21 06:32:15 PST 2016
I've been vaguely aware of D for many years, but the recent
addition of std.experimental.ndslice finally inspired me to give
it a try, since my main expertise lies in the domain of
scientific computing and I primarily use Python/Julia/C++, where
multidimensional arrays can be handled with a great deal of
expressiveness and flexibility. Before writing anything serious,
I wanted to get a sense for the kind of code I would have to
write to get the best performance for numerical calculations, so
I wrote a trivial summation benchmark. The following code gave me
slightly surprising results:
import std.stdio;
import std.array : array;
import std.algorithm;
import std.datetime;
import std.range;
import std.experimental.ndslice;
void main() {
int N = 1000;
int Q = 20;
int times = 1_000;
double[] res1 = uninitializedArray!(double[])(N);
double[] res2 = uninitializedArray!(double[])(N);
double[] res3 = uninitializedArray!(double[])(N);
auto f = iota(0.0, 1.0, 1.0 / Q / N).sliced(N, Q);
StopWatch sw;
double t0, t1, t2;
sw.start();
foreach (unused; 0..times) {
for (int i=0; i<N; ++i) {
res1[i] = sumtest1(f[i]);
}
}
sw.stop();
t0 = sw.peek().msecs;
sw.reset();
sw.start();
foreach (unused; 0..times) {
for (int i=0; i<N; ++i) {
res2[i] = sumtest2(f[i]);
}
}
sw.stop();
t1 = sw.peek().msecs;
sw.reset();
sw.start();
foreach (unused; 0..times) {
sumtest3(f, res3, N, Q);
}
t2 = sw.peek().msecs;
writeln(t0, " ms");
writeln(t1, " ms");
writeln(t2, " ms");
assert( res1 == res2 );
assert( res2 == res3 );
}
auto sumtest1(Range)(Range range) @safe pure nothrow @nogc {
return range.sum;
}
auto sumtest2(Range)(Range f) @safe pure nothrow @nogc {
double retval = 0.0;
foreach (double f_ ; f) {
retval += f_;
}
return retval;
}
auto sumtest3(Range)(Range f, double[] retval, double N, double
Q) @safe pure nothrow @nogc {
for (int i=0; i<N; ++i) {
for (int j=1; j<Q; ++j) {
retval[i] += f[i,j];
}
}
}
When I compiled it using dmd -release -inline -O -noboundscheck
../src/main.d, I got the following timings:
1268 ms
312 ms
271 ms
I had heard while reading up on the language that in D explicit
loops are generally frowned upon and not necessary for the usual
performance reasons. Nevertheless, the two explicit loop
functions gave me an improvement by a factor of 4+. Furthermore,
the difference between sumtest2 and sumtest3 seems to indicate
that function calls have a significant overhead. I also tried
using f.reduce!((a, b) => a + b) instead of f.sum in sumtest1,
but that yielded even worse performance. I did not try the
GDC/LDC compilers yet, since they don't seem to be up to date on
the standard library and don't include the ndslice package last I
checked.
Now, seeing as how my experience writing D is literally a few
hours, is there anything I did blatantly wrong? Did I miss any
optimizations? Most importantly, can the elegant operator
chaining style be generally made as fast as the explicit loops
we've all been writing for decades?
More information about the Digitalmars-d-learn
mailing list