[Issue 360] Compile-time floating-point calculations are sometimes inconsistent

Sat Sep 23 20:04:19 PDT 2006

Don Clugston wrote:
> Walter Bright wrote:
>> A static variable's value can change, so it can't be constant folded. 
>> To have it participate in constant folding, it needs to be declared as 
>> const.
> But if it's const, then it's not float precision! I want both!

You can always use hex float constants. I know they're not pretty, but 
the point of them is to be able to specify exact floating point bit 
patterns. There are no rounding errors with them.

>> 1) It's the way C, C++, and Fortran work. Changing the promotion rules 
>> would mean that, when translating solid, reliable libraries from those 
>> languages to D, one would have to be very, very careful.
> 
> That's very important. Still, those languages don't have implicit type 
> deduction. Also, none of those languages guarantee accuracy of 
> decimal->binary conversions, so there's always some error in decimal 
> constants. Incidentally, I recently read that GCC uses something like 
> 160 bits for constant folding, so it's always going to give results that 
> are different to those on other compilers.
> 
> Why doesn't D behave like C with respect to 'f' suffixes?
> (Ie, do the conversion, then truncate it to float precision).
> Actually, I can't imagine many cases where you'd actually want a 'float' 
> constant instead of a 'real' one.

A float constant would be desirable to keep the calculation all floats 
for speed reasons. I can't think of many reasons one would want reduced 
precision.

>> 2) Float and double are expected to be implemented in hardware. Longer 
>> precisions are often not available. I wanted to make it practical for 
>> a D implementation on those machines to provide a software long 
>> precision floating point type, rather than just making real==double. 
>> Such a type would be very slow compared with double.
> 
> Interesting. I thought that 'real' was supposed to be the highest 
> accuracy fast floating point type, and would therefore be either 64, 80, 
> or 128 bits. So it could also be a double-double?
> For me, the huge benefit of the 'real' type is that it guarantees that 
> optimisation won't change the results. In C, using doubles, it's quite 
> unpredictable when a temporary will be 80 bits, and when it will be 64 
> bits. In D, if you stick to real, you're guaranteed that nothing weird 
> will happen. I'd hate to lose that.

I don't see how one would lose that if real were done in software.

>> 3) Real, even in hardware, is significantly slower than double. Doing 
>> constant folding at max precision at compile time won't affect runtime 
>> performance, so it is 'free'.
> 
> In this case, the initial issue remains: in order to write code which 
> maintains accuracy regardless of machine precision, it is sometimes 
> necessary to specify the precision that should be used for constants.
> The original code was an example where weird things happened because
> that wasn't respected.

Weird things always happen with floating point. It's just a matter of 
where one chooses the seams to show (you pointed out where seams show in 
C with temporary precision). I've seen a lot of cases where people were 
surprised that 0.2f (or similar) was even rounded off, and got caught by 
the roundoff error.

I used to work in mechanical engineering where a lot of numerical 
calculations were done. Accumulating roundoff errors were a huge 
problem, and a lot (most?) engineers didn't understand it. They were 
using calculators for long chains of calculation, and rounding off after 
each step instead of carrying the full calculator precision. They were 
mystified by getting answers at the end that were way off.

It's my experience with that (and also in college where we were taught 
to never round off anything but the final answer) that led to the D 
design decision to internally carry around consts in full precision, 
regardless of type.

Deliberately reduced precision is something that only experts would 
want, and only for special cases. So it's reasonable that that would be 
harder to do (i.e. using hex float constants).

P.S. I also did some digital electronic design work long ago. The 
cardinal rule there was that since TTL devices got faster all the time, 
and old slower TTL parts became unavailable, one designed so that 
swapping in a faster chip would not cause the failure of the system. 
Hence the rule that increasing the precision of a calculation should not 
cause the program to fail <g>.