Always false float comparisons

Mon May 16 01:46:58 PDT 2016

On Monday, 16 May 2016 at 08:10:02 UTC, Walter Bright wrote:
> IEEE floats do not specify precision of intermediate results. A 
> C/C++ compiler can be fully IEEE compliant and yet legitimately 
> have increased precision for intermediate results.

IEEE 754-2008 provide language designers with features that 
enables predictable bit-accurate computations using floats for 
ordinary floating point operations. Only a subset of functions 
are implementation defined.

This also has the advantage that you can do bit-level 
optimizations... including proving asserts to hold at compile 
time and "assume assert" optimizations that you are fond of ;-).

But all the C/C++ compilers I have used support reliable coercion 
to 32 bit floats.

> I posted several links here pointing out this behavior in VC++ 
> and g++. If your C++ numerics code didn't have a problem with 
> it, it's likely you wrote the code in such a way that more 
> accurate answers were not wrong.

I use clang++ only for production, and I don't really care how I 
wrote my code. What I do know is that in performance optimized 
code for 32 bit floats and simd I do rely upon unit-testing with 
guaranteed 32 bit floats. I absolutely do not want unit tests to 
execute with higher precision. I want it to break if 32 bit 
floats fails. If I cannot be sure of this I risk having libraries 
that enter infinite loops in production code.

Keep in mind that even "simple" algorithms can get complex when 
you write for high performance. E.g. sound processing. So having 
predictable outcome is very much desirable. Such code also don't 
gain much from compiler optimizations...

> FP behavior has complex trade-offs with speed, accuracy, 
> compatibility, and size. There are no easy, obvious answers.

Well, but randomly increasing precision is always a bad idea when 
you deal with related computations, like time series. It is 
better to have consistent noise/bias than random noise/bias. 
Also, in the case of audio processing it is not unheard of to 
exploit the 24 bit mantissa and the specifics of the IEEE 32 bit 
floating point format.

Now, I don't object to having a "real" type that works the way 
you want. What I object to is having float and double act that 
way. Or rather, not having strict ieee32 and ieee64 types.