[Issue 13474] Discard excess precision for float and double (x87)

via Digitalmars-d-bugs digitalmars-d-bugs at puremagic.com
Mon Nov 7 01:32:38 PST 2016


https://issues.dlang.org/show_bug.cgi?id=13474

Walter Bright <bugzilla at digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugzilla at digitalmars.com

--- Comment #22 from Walter Bright <bugzilla at digitalmars.com> ---
This boils down to the following code:

 double foo(double x, double t, double s, double c) {
    double y = x - t;
    c += y + s;
    return s + c;
 }

The body of which, when optimized, looks like:

    return s + (c + (x - t) + s);

Or, in x87 instructions:

       fld     qword ptr 01Ch[ESP]
       fld     qword ptr 0Ch[ESP]
       fxch    ST(1)
       fsub    qword ptr 014h[ESP]
       fadd    qword ptr 0Ch[ESP]
       fadd    qword ptr 4[ESP]
       fstp    qword ptr 4[ESP]
       fadd    qword ptr 4[ESP]
       ret     020h

The algorithm relies on rounding to double precision of the (x-t) calculation.
The only way to get the x87 to do that is to actually assign it to memory. But
the compiler optimizes away the assignment to memory, because it is
substantially slower.

The 64 bit code does not have this problem, because the code gen looks like:

       push    RBP
       mov     RBP,RSP
       movsd   XMM4,XMM0
       movsd   XMM5,XMM1
       subsd   XMM3,XMM2
       addsd   XMM3,XMM5
       addsd   XMM4,XMM3
       movsd   XMM0,XMM5
       addsd   XMM0,XMM4
       pop     RBP
       ret

It's doing the same optimization, but the result is rounded to double because
the XMM registers are doubles.

Note that the following targets generate x87 code, not XMM code:

    Win32, Linux32, FreeBSD32

because it is not guaranteed that the target has XMM registers. I suspect we
don't really care about the floating point performance on those targets, but we
do care that the code gives expected results.

So I propose that the fix is to disable optimizing away the assignment to y for
x87 code gen targets.

--


More information about the Digitalmars-d-bugs mailing list