Differences in results when using the same function in CTFE and Runtime
Timon Gehr
timon.gehr at gmx.ch
Sun Aug 18 12:57:41 UTC 2024
On 8/17/24 18:33, Quirin Schroll wrote:
>
The normal use case for floating-point isn't perfectly reproducible
results between different optimization levels.
I would imagine the vast majority of FLOPs nowadays are used in HPC and
AI workloads. Reproducibility is at least a plus, particularly in a
research context.
> However, differences between CTFE and RT are indeed unacceptable for core-language operations. Those are bugs.
No, they are not bugs, it's just the same kind of badly designed
specification. According to the specification, you can get differences
between RT and RT when running the exact same function. Of course you
will get differences between CTFE and RT.
> The reason for that is probably because Walter didn't like that other
> languages nailed down floating-point operations
Probably. C famously nails down floating-point operations, just like it
nails down all the other types. D is really well-known for all of its
unportable built-in data types, because Walter really does not like
nailing things down and this is not one of D's selling points. /s
Anyway, at least LDC is sane on this at runtime by default. Otherwise I
would have to switch language for use cases involving floating point,
which would probably just make me abandon D in the long run.
> so that you'd get both less precise results *and* worse performance.
Imagine just manually using the data type that is most suitable for your
use case.
> That would for example be
> the case on an 80387 coprocessor, and (here's where my knowledge ends)
Then your knowledge may be rather out of date. I get the x87
shenanigans, but that's just not very relevant anymore. I am not
targeting 32-bit x86 with anything nowadays.
> probably also true for basically all hardware today if you consider
> `float` specifically. I know of no hardware, that supports single
> precision, but not double precision. Giving you double precision instead
> of single is at least basically free and possibly even a performance
> boost, while also giving you more precision.
It's nonsense. If I want double, I ask for double. Also, it's definitely
not true that going to double instead of single precision will boost
your performance on a modern machine. If you are lucky it will not slow
you down, but if the code can be auto-vectorized (or you are vectorizing
manually), you are looking at least at a 2x slowdown.
>
> An algorithm like Kahan summation must be implemented in a way that takes those optimizations into account.
I.e., do not try to implement this at all with the built-in
floating-point types. It's impossible.
> This is exactly like in C++, signed integer overflow is undefined, not because it's undefined on the hardware, but because it allows for optimizations.
If you have to resort to invoking insane C++ precedent in order to
defend a point, you have lost the debate. Anyway, it is not at all the
same (triggered by overflow vs triggered by default, undefined behavior
vs wrong result), and also, in D, signed overflow is actually defined
behavior.
> D could easily add specific functions to `core.math` that specify operations as specifically IEEE-754 confirming. Using those, Phobos could give you types that are specified to produce results as specified by IEEE-754, with no interference by the optimizer.
It does not do that. Anyway, I would expect that to go to std.numeric.
> You can't actually do the reverse, i.e. provide a type in Phobos that allows for optimizations of that sort but the core-language types are guaranteed to be unoptimized.
You say "unoptimized", I hear "not broken".
Anyway, clearly the default should be the variant with less pitfalls. If
you really want to add some sort of flexible-precision data types, why
not, but there should be a compiler flag to disable it.
> Such a type would have to be compiler-recognized, i.e. it would end up being a built-in type.
I have no desire at all to suffer from irreproducible behavior because
some dependency tried to max out on some irrelevant to me benchmark. I
also have no desire at all to suffer from an unnecessary performance
penalty just to recover reproducible behavior that is exposed directly
by the hardware.
Of course, then there's the issue that libc math functions are not fully
precise and have differences between implementations, but at least there
seems to be some movement on that front, and this is easy to work around
given that the built-in operations are sane.
More information about the Digitalmars-d
mailing list