Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?

Wed Aug 3 16:00:11 PDT 2016

There was a recent discussion on Phobos about D's floating point 
behavior [1]. I think Ilya summarized quite elegantly our problem:

> We need to argue with @WalterBright to switch to C like 
> floating pointed arithmetic instead of automatic expansion to 
> reals (this is so horrible that it may kill D for science for 
> ever, @wilzbach can tell a story about Tinflex, I can tell 
> about precise and KBN summations, which does not work correctly 
> with 32-bit DMD). D had a lot of long discussions about math 
> and GC. We are moving to have great D without GC now. Well, I 
> hope we will have D with normal FP arithmetic. But how much 
> years we need to change @WalterBright's opinion on this problem 
> to the common form?

Here's my reply & experience to this, maybe it helps to show the 
problem.
I started to document all the bumps with FP math I ran into on 
our mir wiki [2]. While some of these are expected, there are 
some that are horrible, cruel & yield a constant fight against 
the compiler
FP behavior that is different depending on the (1) target, (2) 
compiler or (3) optimization level is very hard to work with. 
Wasn't the entire point of D to get it right and avoid weird, 
unproductive corner-cases across architectures?
The problem with Tinflex and a lot of other scientific math is 
that you need reproducible, predictive behavior. For example the 
Tinflex algorithm is quite sensitive as it (1) uses pow and exp, 
so errors sum up quickly and (2) it has an ordered heap of 
remaining FP values, which means due to FP magic which will be 
explained below I get totally different resulting values 
depending on the architecture. Note that the ordering itself is 
well defined for equality, e.g. the tuples (mymath(1.5), 100), 
(mymath(1.5), 200) need to result in same ordering. I don't want 
my program to fail just because I compiled for 32-bit, but maybe 
code example show more than words.

Consider the following program, it fails on 32-bit :/

```
alias S = double; // same for float

S fun(S)(in S x)
{
     return -1 / x;
}

void main()
{
     S i = fun!S(3);
     assert(i == S(-1) / 3); // this lines passes
     assert(fun!S(3) == S(-1) / 3); // error on 32-bit

     // just to be clear, the following variants don't work either 
on 32-bit
     assert(fun!S(3) == S(-1.0 / 3);
     assert(fun!S(3) == cast(S) S(-1) / 3);
     assert(fun!S(3) == S(S(-1) / 3));
     assert(S(fun!S(3)) == S(-1) / 3);
     assert(cast(S) fun!S(3) == S(-1) / 3);
}
```

Maybe it's easier to see why this behavior is tricky when we look 
at it in action. For example with this program DMD for x86_64 
will yield the same result whereas x86_32 will yield different 
ones.

```
import std.stdio;

alias S = float;
float shL = 0x1.1b95eep-4; // -0.069235
float shR = 0x1.9c7cfep-7; // -0.012588

F fun(F)(F x)
{
     return 1.0 + x * 2;
}

S f1()
{
     S r = fun(shR);
     S l = fun(shL);
     return (r - l);
}

S f2()
{
     return (fun(shR) - fun(shL));
}

void main
{
     writefln("f1: %a", f1()); //  -0x1.d00cap-4
     writefln("f2: %a", f2());  // on 32-bit: -0x1.d00c9cp-4
     assert(f1 == f2); // fails on 32-bit
}
```

To make matters worse std.math yields different results than 
compiler/assembly intrinsics - note that in this example import 
std.math.pow adds about 1K instructions to the output assembler, 
whereas llvm_powf boils down to the assembly powf. Of course the 
performance of powf is a lot better, I measured [3] that e.g. 
std.math.pow takes ~1.5x as long for both LDC and DMD. Of course 
if you need to run this very often, this cost isn't acceptable.

```
void main()
{
     alias S = float;
     S s1 = 0x1.24c92ep+5;
     S s2 = -0x1.1c71c8p+0;

     import std.math : std_pow = pow;
     import core.stdc.stdio : printf;
     import core.stdc.math: powf;

     printf("std: %a\n", std_pow(s1, s2)); // 0x1.2c155ap-6
     printf("pow: %a\n", s1 ^^ s2); // 0x1.2c155ap-6
     printf("powf: %a\n", powf(s1, s2)); // 0x1.2c1558p-6

     version(LDC)
     {
         import ldc.intrinsics : llvm_pow;
         printf("ldc: %a\n", llvm_pow(s1, s2)); // 0x1.2c1558p-6
     }
}
```

I excluded the discrepancies in FP arithmetics between Windows 
and Linux/macOS as it's hopefully just a bug [4].

[1] https://github.com/dlang/phobos/pull/3217
[2] https://github.com/libmir/mir/wiki/Floating-point-issues
[3] https://github.com/wilzbach/d-benchmarks
[4] https://issues.dlang.org/show_bug.cgi?id=16344