Optimize my code =)
Robin
robbepop at web.de
Sat Feb 15 15:06:25 PST 2014
Hiho,
wow, first of all: this community is awesome - so many kind and
interesting answers. =)
With your help I was able to achieve a performance boost for
several different operations.
Some benchmarks:
Allocation of 5 10.000 x 10.000 matrices in a row:
Before: ~8,2 seconds
After: ~2,3 seconds (with the minimallyInitializedArray!)
Multiplication of two 1000x1000 matrices:
Before: ~14,8 seconds
After: ~4,3 seconds
However, I think there is still much potential in order to
further optimize this code. Let me tell you what changes I have
mainly performed on the code so far ...
Matrix is still a class but I changed it to a final class
preventing matrix methods to be virtual. Dimension is now a final
struct (don't know if 'final' is affecting structs in any way
tough ...). This mainly gave the multiplication a huge
performance boost.
When converting the Matrix to a struct from class the
multiplication even lowered from ~4.3 seconds to about 3.6
seconds. However, I am currently not sure if I want matrices to
be structs (value types).
Besides that I tried to add nothrow and pure as attribute to
every possible method. However, as it turned out I wasn't able to
add the pure modifier to any method as I always called an impure
method with it (as stated by the compiler). This actually made
sense most of the times and I think the only pure methods now are
the constructor methods of the Dimension struct.
What may still speed up things?
In my tests it turned out that the simple code:
auto m1 = Matrix!double.random(1000, 1000, -10, 10, 0.25);
auto m2 = Matrix!double.random(1000, 1000, -10, 10, 0.25);
auto m3 = m1 * m2;
Called the normal copy-constructor. This is sad as it would be a
huge performance boost if it would make use of the move
semantics. (In the C++ matrix codes this scenario would actually
call move assignment operator for matrix m3 which is much much
faster than copying.)
But I haven't figured out yet how to use move semantics in D with
class objects. Or is that just possible with struct value types?
I have also tried the LDC compiler. However, it gave me some
strange bugs. E.g. it didn't like the following line:
ref Matrix transpose() const {
return new Matrix(this).transposeAssign();
}
And it forced me to change it to:
ref Matrix transpose() const {
auto m = new Matrix(this);
m.transposeAssign();
return m;
}
Which is kind of ugly ...
I hope that this is getting fixed soon, as I imply it as correct
D code because the DMD is capable to compile this correctly.
Some of you came up with the thing that transposing the matrix
before multiplication of both takes place must be much slower
than without the transposition. In my former Java and C++
programs I have already tested what strategy is faster and it
turned out that cache efficiency DOES matter of course. There are
also papers explaining why a transpose matrix multiplication may
be faster than without transposing.
In the end this is just a test suite and there is already an even
faster approach of a matrix multiplication which runs in O(n^2.8)
instead of O(n^3) as my current simple solution. And with the
power of OpenCL (or similar) one could even lift a matrix
multiplication to the GPU and boom.^^ But the current task is to
find general optimized code for the D language - this thread
already helped me much!
I wasn't aware that D is actually capable of lazy evaluations
other than the if-pseude-lazy-evaluation which is kind of cool.
However, I still have to look that up in order to maybe find a
use for it in these codes.
SIMD instructions also sound extremely cool but a little bit too
complex regarding the fact that I am fairly new to D and still
learning basics of this language.
In the end I wanted to state something which I found very
iritating. Bearophile stated correctly that my pre-conditions for
the index operators are all wrong and corrected my code with some
smooth additions:
T opIndex(in size_t row, in size_t col) const nothrow
in {
assert(row < nRows);
assert(col < nCols);
} body {
return data[dim.offset(row, col)];
}
The in and body statements are cool as far as I realize what they
are for. However, in the benchmarks they had a clear and
noticable negative impact on the matrix multiplication which
raised to ~8 seconds from ~4 seconds with his code compared to
mine. When leaving out the in and body statement block and using
only one normal block as follows, it stayed at the 4 seconds
duration for that task and so I think that the compiler won't
optimize things in an 'in' code block - is that right?
T opIndex(in size_t row, in size_t col) const nothrow {
assert(row < nRows);
assert(col < nCols);
return data[dim.offset(row, col)];
}
Thanks again for all your helpful comments and thanks in advance
- I am eagerly looking forward for your future comments! =)
Robin
More information about the Digitalmars-d-learn
mailing list