Slow performance compared to C++, ideas?

Manu turkeyman at gmail.com
Fri May 31 06:31:08 PDT 2013


On 31 May 2013 23:07, finalpatch <fengli at gmail.com> wrote:

> I actually have some experience with C++ template
> meta-programming in HD video codecs. My experience is that it is
> possible for generic code through TMP to match or even beat hand
> written code. Modern C++ compilers are very good, able to
> optimize away most of the temporary variables resulting very
> compact object code, provides you can avoid branches and keep the
> arguments const refs as much as possible. A real example is my
> TMP generic codec beat the original hand optimized c/asm version
> (both use sse intrinsics) by as much as 30% with only a fraction
> of the line of code. Another example is the Eigen linear algebra
> library, through template meta-programming it is able to match
> the speed of Intel MKL.
>

Just to clarify, I'm not trying to say templates are slow because they're
tempaltes.
There's no reason carefully crafted template code couldn't be identical to
hand crafted code.
What I am saying, is that it introduces the possibility for countless
subtle details to get in the way.
If you want maximum performance from templates, you often need to be really
good at expanding the code in your mind, and visualising it all in expanded
context, so you can then reason whether anything is likely to get in the
way of the optimiser or not.
A lot of people don't possess this skill, and for good reason, it's hard!
It usually takes considerable time to optimise template code, and optimised
template code may often only be optimal in the context you tested against.
At some point, depending on the complexity of your code, it might just be
easier/less time consuming to write the code directly.
It's a fine line, but I've seen so much code that takes it WAAAAY too far.

There's always the unpredictable element too. Imagine a large-ish template
function, and one very small detail inside is customised of otherwise
identical functions.
Let's say 2 routines are generated for int and long; the cost of casting
int -> long and calling the long function in both cases is insignificant,
but using templates, your exe just got bigger, branches less predictable,
icache got more noisy, and there's no way to profile for loss of
performance introduced this way. In-fact, the profiler will typically
erroneously lead you to believe your code is FASTER, but it results in code
that may be slower at net.

I'm attracted to D for the power of it's templates too, but that attraction
is all about simplicity and readability.
In D, you can do more with less. The goal is not to use more and more
templates, but make the few templates I use, more readable and maintainable.

D is very strong at TMP, it provides a lot more tools
> specifically designed for TMP, that is vastly superior than C++
> which relies on abusing the templates. This is actually the main
> reason drawing me to D: TMP in a more pleasant way. IMO one thing
> D needs to address is less surprises, eg. innocent looking code
> like v[] = [x,x,x] shouldn't cause major performance hit. In c++
> memory allocation is explicit, either operator new or malloc, or
> indirectly through a method call, otherwise the language would
> not do heap allocation for you.


Yeah well... I have a constant inner turmoil with this in D.
I want to believe the GC is the future, but I'm still trying to convince
myself of that (and I think the GC is losing the battle at the moment).
Fortunately you can avoid the GC fairly effectively (if you forego large
parts of phobos!).

Buy things like the array initialisation are inexcusable. Array literals
should NOT allocate, this desperately needs to be fixed.
And scope/escape analysis, so local dynamic arrays can be lowered onto the
stack in self-contained situations.
That's the biggest source of difficult-to-control allocations in my
experience.

On Friday, 31 May 2013 at 11:51:04 UTC, Manu wrote:
>
>> Assuming that you would hand-write exactly the same code as the template
>> expansion...
>> Typically template expansion leads to countless temporary redundancies,
>> which you expect the compiler to try and optimise away, but it's not
>> always
>> able to do so, especially if there is an if() nearby, or worse, a pointer
>> dereference.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20130531/187288ad/attachment-0001.html>


More information about the Digitalmars-d mailing list