Differences in results when using the same function in CTFE and Runtime

Tue Aug 13 20:10:17 UTC 2024

On Tuesday, 13 August 2024 at 11:02:07 UTC, Quirin Schroll wrote:
> I honestly don’t know how JRE did implement `double` operations 
> on e.g. the Intel 80486

Probably in software, but modern x86 CPUs can just use the 
hardware; so the difference isn’t so meaningful anymore.

> if I try using `gcc -mfpmath=387 -O3` and add some `double` 
> values, intermediate results are stored and loaded again. Very 
> inefficient. If I use `long double`, that does not happen.

Who cares? In a situation where we must reach the same result on 
every platform (cross-platform determinism) the performance can 
suffer a bit. You are just avoiding my question by making 
excuses. Do you make sure your program’s data is passed around 
exclusively via data registers? Do you only write branchless 
code? Does your entire program fit inside the CPU cache? No, 
because we make performance sacrifices to achieve the desired 
outcome. The idea that the only valid way of coding something is 
the way that compromises on the integrity of the output in favour 
of performance is a step away from programming nihilism.

> You probably meant the x86 with its 80-bit format, which is 
> still noteworthy

Yes because it’s referred to as ‘extended precision’ and doesn’t 
have a proper name because it’s an unstandardised atrocity.
https://en.wikipedia.org/wiki/Extended_precision#x86_extended_precision_format

> However, at least the POWER9 family supports 128-bit IEEE-754 
> quad quadruple-precision floats. IIUC, RISC-V also supports 
> them.

binary128 is obviously not the same as x86’s ‘extended 
precision’. You will not get cross-platform deterministic results 
from using them interchangeably; you will get misery.