Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
Seb via Digitalmars-d
digitalmars-d at puremagic.com
Wed Aug 3 16:00:11 PDT 2016
There was a recent discussion on Phobos about D's floating point
behavior [1]. I think Ilya summarized quite elegantly our problem:
> We need to argue with @WalterBright to switch to C like
> floating pointed arithmetic instead of automatic expansion to
> reals (this is so horrible that it may kill D for science for
> ever, @wilzbach can tell a story about Tinflex, I can tell
> about precise and KBN summations, which does not work correctly
> with 32-bit DMD). D had a lot of long discussions about math
> and GC. We are moving to have great D without GC now. Well, I
> hope we will have D with normal FP arithmetic. But how much
> years we need to change @WalterBright's opinion on this problem
> to the common form?
Here's my reply & experience to this, maybe it helps to show the
problem.
I started to document all the bumps with FP math I ran into on
our mir wiki [2]. While some of these are expected, there are
some that are horrible, cruel & yield a constant fight against
the compiler
FP behavior that is different depending on the (1) target, (2)
compiler or (3) optimization level is very hard to work with.
Wasn't the entire point of D to get it right and avoid weird,
unproductive corner-cases across architectures?
The problem with Tinflex and a lot of other scientific math is
that you need reproducible, predictive behavior. For example the
Tinflex algorithm is quite sensitive as it (1) uses pow and exp,
so errors sum up quickly and (2) it has an ordered heap of
remaining FP values, which means due to FP magic which will be
explained below I get totally different resulting values
depending on the architecture. Note that the ordering itself is
well defined for equality, e.g. the tuples (mymath(1.5), 100),
(mymath(1.5), 200) need to result in same ordering. I don't want
my program to fail just because I compiled for 32-bit, but maybe
code example show more than words.
Consider the following program, it fails on 32-bit :/
```
alias S = double; // same for float
S fun(S)(in S x)
{
return -1 / x;
}
void main()
{
S i = fun!S(3);
assert(i == S(-1) / 3); // this lines passes
assert(fun!S(3) == S(-1) / 3); // error on 32-bit
// just to be clear, the following variants don't work either
on 32-bit
assert(fun!S(3) == S(-1.0 / 3);
assert(fun!S(3) == cast(S) S(-1) / 3);
assert(fun!S(3) == S(S(-1) / 3));
assert(S(fun!S(3)) == S(-1) / 3);
assert(cast(S) fun!S(3) == S(-1) / 3);
}
```
Maybe it's easier to see why this behavior is tricky when we look
at it in action. For example with this program DMD for x86_64
will yield the same result whereas x86_32 will yield different
ones.
```
import std.stdio;
alias S = float;
float shL = 0x1.1b95eep-4; // -0.069235
float shR = 0x1.9c7cfep-7; // -0.012588
F fun(F)(F x)
{
return 1.0 + x * 2;
}
S f1()
{
S r = fun(shR);
S l = fun(shL);
return (r - l);
}
S f2()
{
return (fun(shR) - fun(shL));
}
void main
{
writefln("f1: %a", f1()); // -0x1.d00cap-4
writefln("f2: %a", f2()); // on 32-bit: -0x1.d00c9cp-4
assert(f1 == f2); // fails on 32-bit
}
```
To make matters worse std.math yields different results than
compiler/assembly intrinsics - note that in this example import
std.math.pow adds about 1K instructions to the output assembler,
whereas llvm_powf boils down to the assembly powf. Of course the
performance of powf is a lot better, I measured [3] that e.g.
std.math.pow takes ~1.5x as long for both LDC and DMD. Of course
if you need to run this very often, this cost isn't acceptable.
```
void main()
{
alias S = float;
S s1 = 0x1.24c92ep+5;
S s2 = -0x1.1c71c8p+0;
import std.math : std_pow = pow;
import core.stdc.stdio : printf;
import core.stdc.math: powf;
printf("std: %a\n", std_pow(s1, s2)); // 0x1.2c155ap-6
printf("pow: %a\n", s1 ^^ s2); // 0x1.2c155ap-6
printf("powf: %a\n", powf(s1, s2)); // 0x1.2c1558p-6
version(LDC)
{
import ldc.intrinsics : llvm_pow;
printf("ldc: %a\n", llvm_pow(s1, s2)); // 0x1.2c1558p-6
}
}
```
I excluded the discrepancies in FP arithmetics between Windows
and Linux/macOS as it's hopefully just a bug [4].
[1] https://github.com/dlang/phobos/pull/3217
[2] https://github.com/libmir/mir/wiki/Floating-point-issues
[3] https://github.com/wilzbach/d-benchmarks
[4] https://issues.dlang.org/show_bug.cgi?id=16344
More information about the Digitalmars-d
mailing list