Simple performance question from a newcomer
dextorious via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Mon Feb 22 07:43:23 PST 2016
First of all, I am pleasantly surprised by the rapid influx of
helpful responses. The community here seems quite wonderful. In
the interests of not cluttering the thread too much, since the
advice given here has many commonalities, I will only try to
respond once to each type of suggestion.
On Sunday, 21 February 2016 at 16:29:26 UTC, ZombineDev wrote:
> The problem is not with ranges, but with the particualr
> algorithm used for summing. If you look at the docs
> (http://dlang.org/phobos-prerelease/std_algorithm_iteration.html#.sum) you'll see that if the range has random-access `sum` will use the pair-wise algorithm. About the second and third tests, the problem is with DMD which should not be used when measuring performance (but only for development, because it has fast compile-times).
> ...
> According to `dub --verbose`, my command-line was roughly this:
> ldc2 -ofapp -release -O5 -singleobj -w source/app.d
> ../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/internal.d
> ../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/iteration.d
> ../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/package.d
> ../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/selection.d
> ../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/slice.d
It appears that I cannot use the GDC compiler for this particular
problem due to it using a comparatively older version of the DMD
frontend (I understand Mir requires >=2.068), but I did manage to
get LDC working on my system after a bit of work. Since I've been
using dub to manage my project, I used the default "release"
build type. I also tried compiling manually with LDC, using the
-O5 switch you mentioned. These are the results (I increased the
iteration count to lessen the noise, the array is now 10000x20,
each function is run a thousand times):
DMD LDC (dub) LDC (-release -enable-inlining
-O5 -w -singleobj)
sumtest1:12067 ms 6899 ms 1940 ms
sumtest2: 3076 ms 1349 ms 452 ms
sumtest3: 2526 ms 847 ms 434 ms
sumtest4: 5614 ms 1481 ms 452 ms
The sumtest1, 2 and 3 functions are as given in the first post,
sumtest4 uses the range.reduce!((a, b) => a + b) approach to
enforce naive summation. Much to my satisfaction, the
range.reduce version is now exactly as quick as the traditional
loop and while function inlining isn't quite perfect, the 4%
performance penalty incurred by the 10_000 function calls (or
whatever inlined form the function finally takes) is quite
acceptable.
I do have to wonder, however, about the default settings of dub
in this case. Having gone through its documentation, I might
still not have guessed to try the compiler options you provided,
thereby losing out on a 2-3x performance improvement. What build
options did you use in your dub.json that it managed to translate
to the correct compiler switches?
More information about the Digitalmars-d-learn
mailing list