[Issue 2278] New: Guarantee alignment of stack-allocated variables on x86

d-bugmail at puremagic.com d-bugmail at puremagic.com
Mon Aug 11 08:17:21 PDT 2008


http://d.puremagic.com/issues/show_bug.cgi?id=2278

           Summary: Guarantee alignment of stack-allocated variables on x86
           Product: D
           Version: 1.034
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla at digitalmars.com
        ReportedBy: clugdbug at yahoo.com.au


Use of SSE instructions in 32-bit Windows is problematic, since Windows and the
C calling convention only aligns the stack to 4 bytes, not 8.
It's too late for C and C++ to fix this problem. But D still has a chance, with
a simple addition to the ABI...

Insert the following line into the spec:
D functions must be called with a stack aligned to an 8 byte boundary.

And how to implement this:
(1) whenever a D function is called, insert a 'push EBP'/'pop EBP' around it,
if it has an odd-numbered number of (pushed arguments + pushed registers so far
in this function). Note that this applies to invoking a delegate, too.
(EBP is the best register to use, since it's guaranteed to be preserved, and
it's almost certainly been used recently. On Intel CPUs this means it won't
cause a register read stall).
(2) if local variables are created, make sure that the frame allocates an even
number of DWORDs. (Create a unused local int, if necessary).
(3) extern() functions need stack alignment code at the top of them, since they
could be called from other languages, with wrong stack alignment. Here's an
example.
---
void main()
{
    asm {
        naked;
        mov EBP, ESP;
        and ESP, 0xFFFF_FFC0;    // align to a 64 byte boundary.    
        call alignedmain;
        mov ESP, EBP;
        ret;
    }
}
---
(4) alloca() also needs to ensure that it allocates an even number of DWORDs.

Note that a clever compiler could play games with the frame pointer to
eliminate the (tiny -- approx 1.5 cycles) overhead of (1) in almost all cases.
(eg, by converting one of the 'push reg's into 'mov [EBP+xx], reg' ).

The important thing to note about this solution (compared to using step(3)
everywhere) is that it has lower overhead, and means that the innermost
functions, which are most likely to need stack alignment, don't need to
manually align it. Also note that when there's an even number of parameters,
the overhead is _zero_.


-- 



More information about the Digitalmars-d-bugs mailing list