Optimize my code =)
bearophile
bearophileHUGS at lycos.com
Tue Feb 18 16:04:01 PST 2014
Robin:
> the existance of move semantics in C++ and is one of the
> coolest features since C++11 which increased and simplified
> codes in many cases enormously for value types just as structs
> in D.
I guess Andrei doesn't agree with you (and move semantics in
C++11 is quite hard to understand).
> I also gave scoped imports a try and hoped that they were able
> to reduce my executable file and perhaps increase the
> performance of my program, none of which was true -> confused.
> Instead I now have more lines of code and do not see instantly
> what dependencies the module as itself has. So what is the
> point in scoped imports?
Scoped imports in general can't increase performance. Their main
point is to avoid importing modules that are needed only by
templated code. So if you don't instantiate the template, the
liker works less and the binary is usually smaller (no
moduleinfo, etc).
> Another weird thing is that the result ~= text(tabStr, this[r,
> c]) in the toString method is much slower than the two
> following lines of code:
>
> result ~= tabStr;
> result ~= to!string(this[r, c]);
>
> Does anybody have an answer to this?
It doesn't look too much weird. In the first case you are
allocating and creating larger strings. But I don't think matrix
printing is a bottleneck in a program.
> - Then I have finally found out the optimizing commands for the
> DMD
This is a small but common problem. Perhaps worth fixing.
> There are still many ways to further improve the performance.
> For examply by using LDC
Latest stable and unstable versions of LDC2, try it:
https://github.com/ldc-developers/ldc/releases/tag/v0.12.1
https://github.com/ldc-developers/ldc/releases/tag/v0.13.0-alpha1
> on certain hardwares, paralellism and perhaps by implementing
> COW with no GC dependencies. And of course I may miss many
> other possible optimization features of D.
Matrix multiplication can be improved a lot tiling the matrix (or
better using a cache oblivious algorithm), using SSE/AVX2, using
multiple cores, etc. As starting point you can try to use
std.parallelism. It could speed up your code on 4 cores with a
very limited amount of added code.
Bye,
bearophile
More information about the Digitalmars-d-learn
mailing list