DMD 1.001 release is broken

Fri Jan 26 02:57:50 PST 2007

Walter Bright wrote:
> Frits van Bommel wrote:
>> Here's one that fails (tested on DMD v1.00, v1.002):
>>
>> -----
[snip]
>> -----
>>
>> The problem here is return value/parameter aliasing. It initializes 
>> the return value before even looking at the parameter...
> 
> I believe this fails with C++, too. But I don't think there's any 
> solution to it but completely disable NRVO. There's no reasonable way 
> for the compiler to detect that the d's are aliased. But NRVO is too 

Maybe not detect, but it can disable NRVO if there's an inout parameter 
with the same (or in case of Objects, related) type as the return value. 
Not sure if that's a good idea though, since it may not be needed in the 
majority of cases.
[later] Maybe it could just automatically create the temp variable if 
the place to put the return value is also passed by reference to the 
function? Then you'd still get the benefit of NRVO for calls that don't 
do that and you get the benefit of working code for calls that do ;).
Of course, that only solves _that_ particular aliasing problem...

> valuable an optimization. So instead I'll argue that this is akin to:
>     i = i++;
> i.e. it's an order-of-evaluation issue that should be avoided.

I disagree. Function calls are a well-established operation to serialize 
operations (C++'s sequence points). Conceptually, the assignment happens 
after the return and optimizations should not affect that.
IMHO, at least. (And it would seem "In G++'s Humble Opinion" as well, 
see below)

> NRVO aliasing can be avoided by using a temporary:
>     Data ret2 = ret;
>     return ret2;
> instead of:
>     return ret;
> Passing d by value instead of by ref will do the same thing. I assume 
> that the reason inout is used here is to pass by reference to gain 
> efficiency. Disabling NRVO will lose more than every bit of efficiency 
> gained by passing by reference - so you might as well not pass it by 
> reference, but by value. (NRVO eliminates two copies of the struct, not 
> one. Replacing inout with in adds one copy. So NRVO is still one copy 
> ahead <g>.)

As I mentioned above, I believe this temporary could be created 
automatically when necessary.

Now, about your statement:
 > I believe this fails with C++, too.

Not with g++[1].
I even tried calling the D function from C++ (since the g++-compiled 
translation happens to read both values before writing to them) and 
*that* worked fine, so it would seem g++ fixes it in the caller:
-----
urxae at urxae:~/tmp$ cat callfrob.cpp
#include <iostream>
#include <cassert>

struct Data
{
     int x, y;

     /// To make size > 8 so NRVO is used.
     /// Program runs correctly with this line commented out:
     char filler;
};

extern "C" Data frob(Data& d);

extern"C" void callfrob() {
     using namespace std;
     Data d; d.x = 1; d.y = 2;
     d = frob(d);
     cout << "C++:" << endl << d.x << endl << d.y << endl;
     assert(d.x == 3 && d.y == -1);
}
urxae at urxae:~/tmp$ g++ -c callfrob.cpp -o callfrob.o
urxae at urxae:~/tmp$ cat frob.d
import std.stdio;

struct Data
{
     int x, y;

     /// To make size > 8 so NRVO is used.
     /// Program runs correctly with this line commented out:
     byte filler;
}

extern(C) Data frob(inout Data d)
{
     Data ret;
     ret.y = d.x - d.y;
     ret.x = d.x + d.y;
     return ret;
}

extern(C) void callfrob();

void main() {
     Data d; d.x = 1; d.y = 2;
     d = frob(d);
     writefln("D:");
     writefln(d.x);
     writefln(d.y);
     callfrob();
}
urxae at urxae:~/tmp$ dmd frob.d callfrob.o -L-lstdc++ -offrob
gcc frob.o callfrob.o -o frob -m32 -lphobos -lpthread -lm -Xlinker 
-lstdc++ -Xlinker -L/home/urxae/opt/dmd/lib
urxae at urxae:~/tmp$ ./frob
D:
0
0
C++:
3
-1
-----

The relevant part of callfrob.o's 'objdump -Sr':
-----
   7a:   c7 45 ec 01 00 00 00    movl   $0x1,0xffffffec(%ebp)
   81:   c7 45 f0 02 00 00 00    movl   $0x2,0xfffffff0(%ebp)
   88:   8d 55 d8                lea    0xffffffd8(%ebp),%edx
   8b:   8d 45 ec                lea    0xffffffec(%ebp),%eax
   8e:   89 44 24 04             mov    %eax,0x4(%esp)
   92:   89 14 24                mov    %edx,(%esp)
   95:   e8 fc ff ff ff          call   96 <callfrob+0x24>
                         96: R_386_PC32  frob
   9a:   83 ec 04                sub    $0x4,%esp
   9d:   8b 45 d8                mov    0xffffffd8(%ebp),%eax
   a0:   89 45 ec                mov    %eax,0xffffffec(%ebp)
   a3:   8b 45 dc                mov    0xffffffdc(%ebp),%eax
   a6:   89 45 f0                mov    %eax,0xfffffff0(%ebp)
   a9:   8b 45 e0                mov    0xffffffe0(%ebp),%eax
   ac:   89 45 f4                mov    %eax,0xfffffff4(%ebp)
-----

So it would seem g++ in this situation indeed creates a temporary 
"variable" on the stack, uses it for the return value, and then copies 
it over the actual variable.

Presumably it does this for a reason. Either it's required by the C++ 
standard or they thought it'd be polite to DWIM in this situation...

And no, it doesn't do that if I change the line with the call to 'Data e 
= frob(d)' (and the output to show e.x & e.y):
-----
   7a:   c7 45 ec 01 00 00 00    movl   $0x1,0xffffffec(%ebp)
   81:   c7 45 f0 02 00 00 00    movl   $0x2,0xfffffff0(%ebp)
   88:   8d 55 e0                lea    0xffffffe0(%ebp),%edx
   8b:   8d 45 ec                lea    0xffffffec(%ebp),%eax
   8e:   89 44 24 04             mov    %eax,0x4(%esp)
   92:   89 14 24                mov    %edx,(%esp)
   95:   e8 fc ff ff ff          call   96 <callfrob+0x24>
                         96: R_386_PC32  frob
   9a:   83 ec 04                sub    $0x4,%esp
   9d:   8b 75 e4                mov    0xffffffe4(%ebp),%esi
   a0:   8b 5d e0                mov    0xffffffe0(%ebp),%ebx
-----

Those last two instructions were included to show the absence of the 
copy instructions, the first disassembly had similar ones:
-----
   af:   8b 75 f0                mov    0xfffffff0(%ebp),%esi
   b2:   8b 5d ec                mov    0xffffffec(%ebp),%ebx
-----

(Presumably, they're preparations for the output calls)

And by the way, I also compiled the C++ with -O2 and with -O3. It still 
performed the copy, just more efficiently ;):
-----
   93:   e8 fc ff ff ff          call   94 <callfrob+0x24>
                         94: R_386_PC32  frob
   98:   8b 5d d8                mov    0xffffffd8(%ebp),%ebx
   9b:   8b 75 dc                mov    0xffffffdc(%ebp),%esi
   9e:   8b 45 e0                mov    0xffffffe0(%ebp),%eax
   a1:   89 5d ec                mov    %ebx,0xffffffec(%ebp)
   a4:   89 75 f0                mov    %esi,0xfffffff0(%ebp)
   a7:   83 ec 04                sub    $0x4,%esp
   aa:   89 45 f4                mov    %eax,0xfffffff4(%ebp)
-----

Versions used to test:
-----
urxae at urxae:~/tmp$ g++ --version | head -n 1
g++ (GCC) 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)
urxae at urxae:~/tmp$ dmd | head -n 1
Digital Mars D Compiler v1.002
-----

[1]: g++ is the only C++ compiler I have. If it doesn't work in DMC, I 
humbly suggest you fix that one too :P.