Proposal: Operator overloading without temporaries

Don Clugston dac at nospam.com.au
Mon Mar 27 00:29:13 PST 2006


Background: Operator overloading, in the form it exists in C++ and 
currently in D, inherently results in sub-optimal code, because it 
always results in unnecessary temporary objects being created.

For example,
X = A - ((B*C) + D)* E;

becomes:
T1 = B * C;
T2 = T1 + D;
T3 = T2 * E;
T4 = A - T3;
X = T4;
Four objects were created, whereas only one was strictly required.
In C++, there are libraries like Blitz++ which use complicated 
expression templates in order to avoid these creating these temporaries, 
and provide performance comparable with FORTRAN. I think D can do much 
better...
Note that temporaries are avoided when using the opXXXAssign() operators 
like +=.

===========
   Proposal
===========
(1) Allow the compiler to assume that b = b + c  can be replaced with b 
+= c. (In C++, operator + and operator += are just symbols, the compiler 
doesn't know that there is any relationship between them).
In the example above, this would allow the compiler to generate:
T1 = B * C;
T1 += D;
T1 *= E;

and we have eliminated two of the three temporaries.
(2). Fill in the gaps in the operator overloading table by introducing
opAddAssign_r, opSubAssign_r, etc.

Just as A.opSubAssign(B)
is the operation  A -= B  or equivalently  A = A - B, similarly

A.opSubAssign_r(B)
would mean
A = B - A.
and would only occur when temporaries are generated in expressions. Like 
-=, it's an operation which can frequently be performed very 
efficiently, but at present the language has no way of expressing it.

Our original example then becomes:

T1 = B.opMul(C);
T1.opAddAssign(D);
T1.opMulAssign(E);
T1.opSubAssign_r(A);
X = T1;
... and all the useless temporaries are gone!

More formally, when the expression tree for an expression is generated:
With a binary operator XXX, operating on left & right nodes:

if (the left node is *not* an original leaf node) {
    // the left node is a temporary, does not need to be preserved.
    // we don't care if the right node is a temporary or not
    look for opXXXAssign().
} else if (the the right node is not an original leaf node) {
    // the right node is a temporary
    look for opXXXAssign_r()
} else {
   // both left and right nodes are leaf nodes, we have to
   // create a temporary
    look for opXXX(), just as it does now.
}

These rules also cope with the situation where temporaries are required:
eg
X = (A*B) + (C*D);
becomes
T1 = A*B;
T2 = C*D;
T1 += T2;
X = T1;

If this were implemented, it would permanently eradicate (for D) the 
most significant advantage which Fortran has managed to retain over 
object-oriented languages. And I really don't think it would be 
difficult to implement, or have negative side-effects.

There are a couple of decisions to be made:
(I) should the compiler use opAdd() and generate a temporary, if 
opAddAssign_r() doesn't exist, to preserve existing behaviour? I think 
the answer to this is YES.
(II) should the compiler use opAdd() and generate a temporary, if 
oppAddAssign() doesn't exist, to preserve existing behaviour? Again, I'm 
inclined to answer YES.
(III) If the code includes +=, and there is an opAdd() but no 
opAddAssign(), should the compiler accept this, and just generate an 
opAdd() followed by an assignment?? This would mean that opAdd() would 
generate the += operation as well as +, while opAddAssign() would be a 
performance enhancement. (It would still be possible to have 
opAddAssign() without opAdd(), to have += but not +, but it would not be 
possible to have + without +=). This would mean that += would be 
*purely* syntactic sugar.

Decision III would be a little more difficult to implement and is of 
less obvious merit, I only mention it as a possibility.

Comments?



More information about the Digitalmars-d mailing list