Differences in results when using the same function in CTFE and Runtime
Quirin Schroll
qs.il.paperinik at gmail.com
Mon Aug 19 10:15:40 UTC 2024
On Sunday, 18 August 2024 at 12:57:41 UTC, Timon Gehr wrote:
> On 8/17/24 18:33, Quirin Schroll wrote:
>>
> The normal use case for floating-point isn't perfectly
> reproducible results between different optimization levels.
>
> I would imagine the vast majority of FLOPs nowadays are used in
> HPC and AI workloads. Reproducibility is at least a plus,
> particularly in a research context.
>
>> However, differences between CTFE and RT are indeed
>> unacceptable for core-language operations. Those are bugs.
>
> No, they are not bugs, it's just the same kind of badly
> designed specification. According to the specification, you can
> get differences between RT and RT when running the exact same
> function. Of course you will get differences between CTFE and
> RT.
>
>> The reason for that is probably because Walter didn't like
>> that other languages nailed down floating-point operations
>
> Probably. C famously nails down floating-point operations, just
> like it nails down all the other types. D is really well-known
> for all of its unportable built-in data types, because Walter
> really does not like nailing things down and this is not one of
> D's selling points. /s
>
> Anyway, at least LDC is sane on this at runtime by default.
> Otherwise I would have to switch language for use cases
> involving floating point, which would probably just make me
> abandon D in the long run.
>
>> so that you'd get both less precise results *and* worse
>> performance.
>
> Imagine just manually using the data type that is most suitable
> for your use case.
>
>> That would for example be the case on an 80387 coprocessor,
>> and (here's where my knowledge ends)
>
> Then your knowledge may be rather out of date. I get the x87
> shenanigans, but that's just not very relevant anymore. I am
> not targeting 32-bit x86 with anything nowadays.
>
>> probably also true for basically all hardware today if you
>> consider `float` specifically. I know of no hardware, that
>> supports single precision, but not double precision. Giving
>> you double precision instead of single is at least basically
>> free and possibly even a performance boost, while also giving
>> you more precision.
>
> It's nonsense. If I want double, I ask for double. Also, it's
> definitely not true that going to double instead of single
> precision will boost your performance on a modern machine. If
> you are lucky it will not slow you down, but if the code can be
> auto-vectorized (or you are vectorizing manually), you are
> looking at least at a 2x slowdown.
>
>>
>> An algorithm like Kahan summation must be implemented in a way
>> that takes those optimizations into account.
>
> I.e., do not try to implement this at all with the built-in
> floating-point types. It's impossible.
>
>> This is exactly like in C++, signed integer overflow is
>> undefined, not because it's undefined on the hardware, but
>> because it allows for optimizations.
>
> If you have to resort to invoking insane C++ precedent in order
> to defend a point, you have lost the debate. Anyway, it is not
> at all the same (triggered by overflow vs triggered by default,
> undefined behavior vs wrong result), and also, in D, signed
> overflow is actually defined behavior.
>
>> D could easily add specific functions to `core.math` that
>> specify operations as specifically IEEE-754 confirming. Using
>> those, Phobos could give you types that are specified to
>> produce results as specified by IEEE-754, with no interference
>> by the optimizer.
>
> It does not do that. Anyway, I would expect that to go to
> std.numeric.
>
>> You can't actually do the reverse, i.e. provide a type in
>> Phobos that allows for optimizations of that sort but the
>> core-language types are guaranteed to be unoptimized.
>
> You say "unoptimized", I hear "not broken".
>
> Anyway, clearly the default should be the variant with less
> pitfalls. If you really want to add some sort of
> flexible-precision data types, why not, but there should be a
> compiler flag to disable it.
>
>> Such a type would have to be compiler-recognized, i.e. it
>> would end up being a built-in type.
>
> I have no desire at all to suffer from irreproducible behavior
> because some dependency tried to max out on some irrelevant to
> me benchmark. I also have no desire at all to suffer from an
> unnecessary performance penalty just to recover reproducible
> behavior that is exposed directly by the hardware.
>
> Of course, then there's the issue that libc math functions are
> not fully precise and have differences between implementations,
> but at least there seems to be some movement on that front, and
> this is easy to work around given that the built-in operations
> are sane.
I think you got me wrong here on a key aspect: I’m not trying to
argue, but to explain what the D spec means and outline possible
rationales for why it is as it is. For example, my x87 knowledge
isn’t “outdated,” x87 co-processors didn’t change. They’re just
not relevant anymore practically, but they could have influenced
*why* Walter specified D as he did. I totally agree with you that
in the context of modern hardware, D’s float spec makes little
sense.
Then I tried to outline a compromise between Walter’s numerously
stated opinion and the desires of many D community members,
including you.
I’m somewhere between your side and neither side. I know enough
about floating-point arithmetic to avoid it as much as I can, and
it’s not due to how languages implement it, but how it works in
practice. Personally, I wouldn’t even care much if D removed
floating-point types entirely. I just see (saw?, years ago) great
potential in D and want it to succeed, and I believe having
reproducible results would be a big win.
One aspect about immune-to-optimizations types (calling them
`float32` and `float64` for now) would be that if you have both a
context in which you want good and fast algebraic results (where
using fused-multiply-add is welcomed), but also a context, in
which reproducibility is required, you could use `double` in the
first context, and `float64` in the second, while passing
`-ffast-math`. Maybe a `pragma` on one kind of those functions is
better. I don’t know. They’re not mutually exclusive. LDC already
has `@fastmath`, so maybe it can just become official and be part
of the D spec?
More information about the Digitalmars-d
mailing list