Question/request/bug(?) re. floating-point in dmd

Wed Oct 23 23:12:01 PDT 2013

On Wednesday, 23 October 2013 at 15:44:54 UTC, Apollo Hogan wrote:
> For example, the appended code produces the following output 
> when compiled (DMD32 D Compiler v2.063.2 under WinXP/cygwin) 
> with no optimization:
>
> immutable(pair)(1.1, -2.03288e-20)
> pair(1, 0.1)
> pair(1.1, -8.32667e-17)
>
> and the following results when compiled with optimization (-O):
>
> immutable(pair)(1.1, -2.03288e-20)
> pair(1, 0.1)
> pair(1.1, 0)
>
> The desired result would be:
>
> immutable(pair)(1.1, -8.32667e-17)
> pair(1, 0.1)
> pair(1.1, -8.32667e-17)
>
> Cheers,
> --Apollo
>
> import std.stdio;
> struct pair { double hi, lo; }
> pair normalize(pair q)
> {
>   double h = q.hi + q.lo;
>   double l = q.lo + (q.hi - h);
>   return pair(h,l);
> }
> void main()
> {
>   immutable static pair spn = normalize(pair(1.0,0.1));
>   writeln(spn);
>   writeln(pair(1.0,0.1));
>   writeln(normalize(pair(1.0,0.1)));
> }

I can replicate it here. Here is an objdump diff of normalize:

Optimized:                                                        
     | Unoptimized:
08076bdc <_D6fptest9normalizeFS6fptest4pairZS6fptest4pair>:       
      08076bdc <_D6fptest9normalizeFS6fptest4pairZS6fptest4pair>:
  8076bdc:         55                            push   %ebp       
       8076bdc:         55                            push   %ebp
  8076bdd:         8b ec                         mov    %esp,%ebp  
       8076bdd:         8b ec                         mov    
%esp,%ebp
  8076bdf:         83 ec 10                      sub    $0x10,%esp 
     | 8076bdf:         83 ec 14                      sub    
$0x14,%esp
  8076be2:         dd 45 08                      fldl   0x8(%ebp)  
       8076be2:         dd 45 08                      fldl   
0x8(%ebp)
  8076be5:         d9 c0                         fld    %st(0)     
     | 8076be5:         dc 45 10                      faddl  
0x10(%ebp)
  8076be7:         89 c1                         mov    %eax,%ecx  
     | 8076be8:         dd 5d ec                      fstpl  
-0x14(%ebp)
  8076be9:         dc 45 10                      faddl  0x10(%ebp) 
     | 8076beb:         dd 45 08                      fldl   
0x8(%ebp)
  8076bec:         dd 55 f0                      fstl   
-0x10(%ebp)    | 8076bee:         dc 65 ec                      
fsubl  -0x14(%ebp)
  8076bef:         de e9                         fsubrp %st,%st(1) 
     | 8076bf1:         dc 45 10                      faddl  
0x10(%ebp)
  8076bf1:         dd 45 f0                      fldl   
-0x10(%ebp)    | 8076bf4:         dd 5d f4                      
fstpl  -0xc(%ebp)
  8076bf4:         d9 c9                         fxch   %st(1)     
     | 8076bf7:         dd 45 ec                      fldl   
-0x14(%ebp)
  8076bf6:         dc 45 10                      faddl  0x10(%ebp) 
     | 8076bfa:         dd 18                         fstpl  (%eax)
  8076bf9:         dd 5d f8                      fstpl  -0x8(%ebp) 
     | 8076bfc:         dd 45 f4                      fldl   
-0xc(%ebp)
  8076bfc:         dd 45 f8                      fldl   -0x8(%ebp) 
     | 8076bff:         dd 58 08                      fstpl  
0x8(%eax)
  8076bff:         d9 c9                         fxch   %st(1)     
     | 8076c02:         c9                            leave
  8076c01:         dd 19                         fstpl  (%ecx)     
     | 8076c03:         c2 10 00                      ret    $0x10
  8076c03:         dd 59 08                      fstpl  0x8(%ecx)  
     | 8076c06:         90                            nop
  8076c06:         8b e5                         mov    %ebp,%esp  
     | 8076c07:         90                            nop
  8076c08:         5d                            pop    %ebp       
     | 8076c08:         90                            nop
  8076c09:         c2 10 00                      ret    $0x10      
     | 8076c09:         90                            nop

     > 8076c0a:         90                            nop

     > 8076c0b:         90                            nop

I cannot see any significant difference. The fadd-fsub-fadd 
sequence seems to be the same in both cases.