Simple performance question from a newcomer

Mon Feb 22 07:43:23 PST 2016

First of all, I am pleasantly surprised by the rapid influx of 
helpful responses. The community here seems quite wonderful. In 
the interests of not cluttering the thread too much, since the 
advice given here has many commonalities, I will only try to 
respond once to each type of suggestion.

On Sunday, 21 February 2016 at 16:29:26 UTC, ZombineDev wrote:
> The problem is not with ranges, but with the particualr 
> algorithm used for summing. If you look at the docs 
> (http://dlang.org/phobos-prerelease/std_algorithm_iteration.html#.sum) you'll see that if the range has random-access `sum` will use the pair-wise algorithm. About the second and third tests, the problem is with DMD which should not be used when measuring performance (but only for development, because it has fast compile-times).
> ...
> According to `dub --verbose`, my command-line was roughly this:
> ldc2 -ofapp -release -O5 -singleobj -w source/app.d
> ../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/internal.d
> ../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/iteration.d
> ../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/package.d
> ../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/selection.d
> ../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/slice.d

It appears that I cannot use the GDC compiler for this particular 
problem due to it using a comparatively older version of the DMD 
frontend (I understand Mir requires >=2.068), but I did manage to 
get LDC working on my system after a bit of work. Since I've been 
using dub to manage my project, I used the default "release" 
build type. I also tried compiling manually with LDC, using the 
-O5 switch you mentioned. These are the results (I increased the 
iteration count to lessen the noise, the array is now 10000x20, 
each function is run a thousand times):

             DMD    LDC (dub)    LDC (-release -enable-inlining 
-O5 -w -singleobj)
sumtest1:12067 ms  6899 ms      1940 ms
sumtest2: 3076 ms  1349 ms       452 ms
sumtest3: 2526 ms   847 ms       434 ms
sumtest4: 5614 ms  1481 ms       452 ms

The sumtest1, 2 and 3 functions are as given in the first post, 
sumtest4 uses the range.reduce!((a, b) => a + b) approach to 
enforce naive summation. Much to my satisfaction, the 
range.reduce version is now exactly as quick as the traditional 
loop and while function inlining isn't quite perfect, the 4% 
performance penalty incurred by the 10_000 function calls (or 
whatever inlined form the function finally takes) is quite 
acceptable.

I do have to wonder, however, about the default settings of dub 
in this case. Having gone through its documentation, I might 
still not have guessed to try the compiler options you provided, 
thereby losing out on a 2-3x performance improvement. What build 
options did you use in your dub.json that it managed to translate 
to the correct compiler switches?