Differences in results when using the same function in CTFE and Runtime

Mon Aug 19 10:15:40 UTC 2024

On Sunday, 18 August 2024 at 12:57:41 UTC, Timon Gehr wrote:
> On 8/17/24 18:33, Quirin Schroll wrote:
>> 
> The normal use case for floating-point isn't perfectly 
> reproducible results between different optimization levels.
>
> I would imagine the vast majority of FLOPs nowadays are used in 
> HPC and AI workloads. Reproducibility is at least a plus, 
> particularly in a research context.
>
>> However, differences between CTFE and RT are indeed 
>> unacceptable for core-language operations. Those are bugs.
>
> No, they are not bugs, it's just the same kind of badly 
> designed specification. According to the specification, you can 
> get differences between RT and RT when running the exact same 
> function. Of course you will get differences between CTFE and 
> RT.
>
>> The reason for that is probably because Walter didn't like 
>> that other languages nailed down floating-point operations
>
> Probably. C famously nails down floating-point operations, just 
> like it nails down all the other types. D is really well-known 
> for all of its unportable built-in data types, because Walter 
> really does not like nailing things down and this is not one of 
> D's selling points. /s
>
> Anyway, at least LDC is sane on this at runtime by default. 
> Otherwise I would have to switch language for use cases 
> involving floating point, which would probably just make me 
> abandon D in the long run.
>
>> so that you'd get both less precise results *and* worse 
>> performance.
>
> Imagine just manually using the data type that is most suitable 
> for your use case.
>
>> That would for example be the case on an 80387 coprocessor, 
>> and (here's where my knowledge ends)
>
> Then your knowledge may be rather out of date. I get the x87 
> shenanigans, but that's just not very relevant anymore. I am 
> not targeting 32-bit x86 with anything nowadays.
>
>> probably also true for basically all hardware today if you 
>> consider `float` specifically. I know of no hardware, that 
>> supports single precision, but not double precision. Giving 
>> you double precision instead of single is at least basically 
>> free and possibly even a performance boost, while also giving 
>> you more precision.
>
> It's nonsense. If I want double, I ask for double. Also, it's 
> definitely not true that going to double instead of single 
> precision will boost your performance on a modern machine. If 
> you are lucky it will not slow you down, but if the code can be 
> auto-vectorized (or you are vectorizing manually), you are 
> looking at least at a 2x slowdown.
>
>> 
>> An algorithm like Kahan summation must be implemented in a way 
>> that takes those optimizations into account.
>
> I.e., do not try to implement this at all with the built-in 
> floating-point types. It's impossible.
>
>> This is exactly like in C++, signed integer overflow is 
>> undefined, not because it's undefined on the hardware, but 
>> because it allows for optimizations.
>
> If you have to resort to invoking insane C++ precedent in order 
> to defend a point, you have lost the debate. Anyway, it is not 
> at all the same (triggered by overflow vs triggered by default, 
> undefined behavior vs wrong result), and also, in D, signed 
> overflow is actually defined behavior.
>
>> D could easily add specific functions to `core.math` that 
>> specify operations as specifically IEEE-754 confirming. Using 
>> those, Phobos could give you types that are specified to 
>> produce results as specified by IEEE-754, with no interference 
>> by the optimizer.
>
> It does not do that. Anyway, I would expect that to go to 
> std.numeric.
>
>> You can't actually do the reverse, i.e. provide a type in 
>> Phobos that allows for optimizations of that sort but the 
>> core-language types are guaranteed to be unoptimized.
>
> You say "unoptimized", I hear "not broken".
>
> Anyway, clearly the default should be the variant with less 
> pitfalls. If you really want to add some sort of 
> flexible-precision data types, why not, but there should be a 
> compiler flag to disable it.
>
>> Such a type would have to be compiler-recognized, i.e. it 
>> would end up being a built-in type.
>
> I have no desire at all to suffer from irreproducible behavior 
> because some dependency tried to max out on some irrelevant to 
> me benchmark. I also have no desire at all to suffer from an 
> unnecessary performance penalty just to recover reproducible 
> behavior that is exposed directly by the hardware.
>
> Of course, then there's the issue that libc math functions are 
> not fully precise and have differences between implementations, 
> but at least there seems to be some movement on that front, and 
> this is easy to work around given that the built-in operations 
> are sane.

I think you got me wrong here on a key aspect: I’m not trying to 
argue, but to explain what the D spec means and outline possible 
rationales for why it is as it is. For example, my x87 knowledge 
isn’t “outdated,” x87 co-processors didn’t change. They’re just 
not relevant anymore practically, but they could have influenced 
*why* Walter specified D as he did. I totally agree with you that 
in the context of modern hardware, D’s float spec makes little 
sense.

Then I tried to outline a compromise between Walter’s numerously 
stated opinion and the desires of many D community members, 
including you.

I’m somewhere between your side and neither side. I know enough 
about floating-point arithmetic to avoid it as much as I can, and 
it’s not due to how languages implement it, but how it works in 
practice. Personally, I wouldn’t even care much if D removed 
floating-point types entirely. I just see (saw?, years ago) great 
potential in D and want it to succeed, and I believe having 
reproducible results would be a big win.

One aspect about immune-to-optimizations types (calling them 
`float32` and `float64` for now) would be that if you have both a 
context in which you want good and fast algebraic results (where 
using fused-multiply-add is welcomed), but also a context, in 
which reproducibility is required, you could use `double` in the 
first context, and `float64` in the second, while passing 
`-ffast-math`. Maybe a `pragma` on one kind of those functions is 
better. I don’t know. They’re not mutually exclusive. LDC already 
has `@fastmath`, so maybe it can just become official and be part 
of the D spec?