Optimize my code =)
Robin
robbepop at web.de
Tue Feb 18 15:36:11 PST 2014
Hiho,
I am happy to see that I could encourage so many people to
discuss about this topic to not only give me interesting answer
to my questions but also to analyse and evaluate features and
performance of D.
I have fixed the allocation performance problem via a custom
destructor method which manually notifies the GC that its data is
garbage so that the GC can free it in the next cycle and no
longer have to determine if it is no longer used. This extremely
speeded up the allocation tests but increased overall runtime
performance by a very slight amount of time because memory is now
freed contigeously.
@Nick Sabalausky:
Why should I remove .dub from the copy constructor? In my opinion
this is important to keep both matrices (source and copy)
independent from each other. The suggested COW feature for
matrices sound interesting but also weird. I have to read more
about that.
Move semantics in D a needed and existing, however, I think that
the D compiler isn't as good as the C++ compiler in determining
what is moveable and what not.
Another thing which is hopefully a bug in the current language
implementation of D is the strange behaviour of the transpose
method with the only line of code:
return Matrix(this).transposeAssign();
In C++ for example this compiles without any problems and results
in correct transposed matrix copies - this works due to the
existance of move semantics in C++ and is one of the coolest
features since C++11 which increased and simplified codes in many
cases enormously for value types just as structs in D.
I have ran several tests in order to test when the move assign or
the move constructors are called in D and whenever I expected
them to be called the copy-constructor or the postblit was called
instead which was frustating imo.
Perhaps I still haven't quite understood the things behind the
scenes in D but it can't be the solution to always copy the whole
data whenever I could instead have a move of the whole data on
the heap.
Besides that on suggestion which came up was that I could insert
the Dimension module into the Matrix module as their are
semantically working together. However, I find it better to have
many smaller code snippets instead of fewer bigger ones and
that's why I keep them both separated.
I also gave scoped imports a try and hoped that they were able to
reduce my executable file and perhaps increase the performance of
my program, none of which was true -> confused. Instead I now
have more lines of code and do not see instantly what
dependencies the module as itself has. So what is the point in
scoped imports?
The mixin keyword is also nice to have but it feels similar to a
dirty C-macro to me where text replacement with lower abstraction
(unlike templates) takes place. Of course I am wrong and you will
teach me why but atm I have strange feelings about implementing
codes with mixins. In this special case: perhaps it isn't a wise
decision to merge addition with subtraction and perhaps I can
find faster ways to do that which invole more differences in both
actions which requires to split both methods up again.
(theoretical, but it could be)
Another weird thing is that the result ~= text(tabStr, this[r,
c]) in the toString method is much slower than the two following
lines of code:
result ~= tabStr;
result ~= to!string(this[r, c]);
Does anybody have an answer to this?
I am now thinking that I know the most important things about how
to write ordinary (not super) efficient D code - laugh at me if
you want :D - and will continue extending this benchmarking
library and whenever I feel bad about a certain algorithm's
performance I will knock here in this thread again, you know. :P
In the end of my post I just want to summarize the benchmark
history for the matrix multiplication as I think that it is
funny: (All tests for two 1000x1000 matrices!)
- The bloody first implementation of the matrix implementation
(which worked) required about 39 seconds to finish.
- Then I have finally found out the optimizing commands for the
DMD and the multiplication performance double roughly to about 14
seconds.
- I created an account here and due to your massive help the
matrix multiplication required only about 5 seconds shortly after
due to better const, pure and nothrow usage.
- Through the shift from class to struct and some optimizations
in memory layout and further usage improvements of const, pure
and nothrow as well as several array feature usages and foreach
loop the algorithm performance raised once again and required
about 3,7 seconds.
- The last notifiable optimization was the implemenatation based
on pointer arithmentics which again improved the performance from
3,7 seconds to roughly about 2 seconds.
Due to this development I see that there is still a possibility
that we could beat the 1,5 seconds from Java or even the 1,3
seconds from C++! (based on my machine: 64bit-archlinux, dualcore
2,2ghz, 4gb ram)
There are still many ways to further improve the performance. For
examply by using LDC on certain hardwares, paralellism and
perhaps by implementing COW with no GC dependencies. And of
course I may miss many other possible optimization features of D.
I by myself can say that I have learn a lot and that's the most
important thing above everything else for me here.
Thank you all for this very interesting conversation! You - the
community - are a part what makes D a great language. =)
Robin
More information about the Digitalmars-d-learn
mailing list