float.max + 1.0 does not overflow

Wed Dec 27 14:14:42 UTC 2017

On Wednesday, 27 December 2017 at 13:40:28 UTC, rumbu wrote:
> Is that normal?
>
> use std.math;
> float f = float.max;
> f += 1.0;
> assert(IeeeFlags.overflow) //failure
> assert(f == float.inf) //failure, f is in fact float.max
>
> On the contrary, float.max + float.max will overflow. The 
> behavior is the same for double and real.

This is actually correct floating point behavior. Consider the 
following program:

float nextReprensentableToMax = float.max;
// find next smaller representable floating point number
(*cast(int*)&nextReprensentableToMax)--;
writefln("%f", float.max - nextReprensentableToMax);

It computes the difference between float.max and the next smaller 
reprensentable number in floating point. The difference printed 
by the program is:
20282409603651670423947251286016.0

As you might notice this is siginificantly bigger then 1.0. 
Floating point operations work like this: They perform the 
operation and then round to the nearest representable number in 
floating point. So adding 1.0 to float.max and then rounding to 
the nearest representable number will just give you back 
float.max. If you however add float.max and float.max the next 
nearest reprensentable number is float.inf.

When trying to understand how floating point works I would highly 
recommend that you read these articles (oldest first): 
https://randomascii.wordpress.com/category/floating-point/

Kind Regards
Benjamin Thaut