Optimize my code =)
Robin
robbepop at web.de
Mon Feb 17 13:56:17 PST 2014
Hiho,
thank you for your code improvements and suggestions.
I really like the foreach loop in D as well as the slight (but
existing) performance boost over conventional for loops. =)
Another success of the changes I have made is that I have
achieved to further improve the matrix multiplication performance
from 3.6 seconds for two 1000x1000 matrices to 1.9 seconds which
is already very close to java and c++ with about 1.3 - 1.5
seconds.
The key to victory was pointer arithmetics as I notices that I
have used them in the C++ implementation, too. xD
The toString implementation has improved its performance slightly
due to the changes you have mentioned above: 1.37 secs -> 1.29
secs
I have also adjusted all operator overloadings to the "new style"
- I just haven't known about that "new style" until then - thanks!
I will just post the whole code again so that you can see what I
have changed.
Keep in mind that I am still using DMD as compiler and thus
performance may still raise once I use another compiler!
All in all I am very happy with the code analysis and its
improvements!
However, there were some strange things of which I am very
confused ...
void allocationTest() {
writeln("allocationTest ...");
sw.start();
auto m1 = Matrix!double(10000, 10000);
{ auto m2 = Matrix!double(10000, 10000); }
{ auto m2 = Matrix!double(10000, 10000); }
{ auto m2 = Matrix!double(10000, 10000); }
//{ auto m2 = Matrix!double(10000, 10000); }
sw.stop();
printBenchmarks();
}
This is the most confusing code snippet. I have just changed the
whole allocation for all m1 and m2 from new Matrix!double (on
heap) to Matrix!double (on stack) and the performance dropped
significantly - the benchmarked timed raised from 2,3 seconds to
over 25 seconds!! Now look at the code above. When I leave it as
it is now, the code requires about 2,9 seconds runtime, however,
when enabeling the currently out-commented line the code takes 14
to 25 seconds longer! mind blown ... 0.o This is extremely
confusion as I allocate these matrices on the stack and since I
have allocated them within their own scoped-block they should
instantly release their memory again so that no memory
consumption takes place for more than 2 matrices at the same
time. This just wasn't the fact as far as I have tested it.
Another strange things was that the new opEquals implementation:
bool opEquals(const ref Matrix other) const pure nothrow {
if (this.dim != other.dim) {
return false;
}
foreach (immutable i; 0 .. this.dim.size) {
if (this.data[i] != other.data[i]) return false;
}
return true;
}
is actually about 20% faster than the one you have suggested.
With the single line of "return (this.dim == other.dim &&
this.data[] == other.data[]).
The last thing I haven't quite understood is that I tried to
replace
auto t = Matrix(other).transposeAssign();
in the matrix multiplication algorithm with its shorter and
clearer form
auto t = other.transpose(); // sorry for the nasty '()', but I
like them! :/
This however gave me wonderful segmentation faults on runtime
while using the matrix multiplication ...
And here is the complete and improved code:
http://dpaste.dzfl.pl/7f8610efa82b
Thanks in advance for helping me! =)
Robin
More information about the Digitalmars-d-learn
mailing list