<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
So I guess pairwise summation is one to blame here.
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<br>
<br>
<div class="moz-cite-prefix">Dne 21.2.2016 v 16:56 Daniel Kozak
napsal(a):<br>
</div>
<blockquote cite="mid:56C9DE38.4010604@gmail.com" type="cite">You
can use -profile to see what is causing it.
<br>
<br>
Num Tree Func Per
<br>
Calls Time Time Call
<br>
<br>
23000000 550799875 550243765 23 pure nothrow
@nogc @safe double std.algorithm.iteration.sumPairwise!(double,
std.experimental.ndslice.slice.Slice!(1uL, std.range.iota!(double,
double, double).iota(double, double,
double).Result).Slice).sumPairwise(std.experimental.ndslice.slice.Slice!(1uL,
std.range.iota!(double, double, double).iota(double, double,
double).Result).Slice)
<br>
<br>
Dne 21.2.2016 v 15:32 dextorious via Digitalmars-d-learn
napsal(a):
<br>
<blockquote type="cite">I've been vaguely aware of D for many
years, but the recent addition of std.experimental.ndslice
finally inspired me to give it a try, since my main expertise
lies in the domain of scientific computing and I primarily use
Python/Julia/C++, where multidimensional arrays can be handled
with a great deal of expressiveness and flexibility. Before
writing anything serious, I wanted to get a sense for the kind
of code I would have to write to get the best performance for
numerical calculations, so I wrote a trivial summation
benchmark. The following code gave me slightly surprising
results:
<br>
<br>
import std.stdio;
<br>
import std.array : array;
<br>
import std.algorithm;
<br>
import std.datetime;
<br>
import std.range;
<br>
import std.experimental.ndslice;
<br>
<br>
void main() {
<br>
int N = 1000;
<br>
int Q = 20;
<br>
int times = 1_000;
<br>
double[] res1 = uninitializedArray!(double[])(N);
<br>
double[] res2 = uninitializedArray!(double[])(N);
<br>
double[] res3 = uninitializedArray!(double[])(N);
<br>
auto f = iota(0.0, 1.0, 1.0 / Q / N).sliced(N, Q);
<br>
StopWatch sw;
<br>
double t0, t1, t2;
<br>
sw.start();
<br>
foreach (unused; 0..times) {
<br>
for (int i=0; i<N; ++i) {
<br>
res1[i] = sumtest1(f[i]);
<br>
}
<br>
}
<br>
sw.stop();
<br>
t0 = sw.peek().msecs;
<br>
sw.reset();
<br>
sw.start();
<br>
foreach (unused; 0..times) {
<br>
for (int i=0; i<N; ++i) {
<br>
res2[i] = sumtest2(f[i]);
<br>
}
<br>
}
<br>
sw.stop();
<br>
t1 = sw.peek().msecs;
<br>
sw.reset();
<br>
sw.start();
<br>
foreach (unused; 0..times) {
<br>
sumtest3(f, res3, N, Q);
<br>
}
<br>
t2 = sw.peek().msecs;
<br>
writeln(t0, " ms");
<br>
writeln(t1, " ms");
<br>
writeln(t2, " ms");
<br>
assert( res1 == res2 );
<br>
assert( res2 == res3 );
<br>
}
<br>
<br>
auto sumtest1(Range)(Range range) @safe pure nothrow @nogc {
<br>
return range.sum;
<br>
}
<br>
<br>
auto sumtest2(Range)(Range f) @safe pure nothrow @nogc {
<br>
double retval = 0.0;
<br>
foreach (double f_ ; f) {
<br>
retval += f_;
<br>
}
<br>
return retval;
<br>
}
<br>
<br>
auto sumtest3(Range)(Range f, double[] retval, double N, double
Q) @safe pure nothrow @nogc {
<br>
for (int i=0; i<N; ++i) {
<br>
for (int j=1; j<Q; ++j) {
<br>
retval[i] += f[i,j];
<br>
}
<br>
}
<br>
}
<br>
<br>
When I compiled it using dmd -release -inline -O -noboundscheck
../src/main.d, I got the following timings:
<br>
1268 ms
<br>
312 ms
<br>
271 ms
<br>
<br>
I had heard while reading up on the language that in D explicit
loops are generally frowned upon and not necessary for the usual
performance reasons. Nevertheless, the two explicit loop
functions gave me an improvement by a factor of 4+. Furthermore,
the difference between sumtest2 and sumtest3 seems to indicate
that function calls have a significant overhead. I also tried
using f.reduce!((a, b) => a + b) instead of f.sum in
sumtest1, but that yielded even worse performance. I did not try
the GDC/LDC compilers yet, since they don't seem to be up to
date on the standard library and don't include the ndslice
package last I checked.
<br>
<br>
Now, seeing as how my experience writing D is literally a few
hours, is there anything I did blatantly wrong? Did I miss any
optimizations? Most importantly, can the elegant operator
chaining style be generally made as fast as the explicit loops
we've all been writing for decades?
<br>
</blockquote>
<br>
</blockquote>
<br>
</body>
</html>