floating point comparison basics

H. S. Teoh hsteoh at quickfur.ath.cx
Tue Dec 3 15:15:53 PST 2013


On Tue, Dec 03, 2013 at 11:03:48PM +0100, ed wrote:
> Hi All,
> 
> I'm learning programming and chose D because it is the best :D But,
> I've hit floating point numbers and I'm stuck on some of the basics.
> 
> What is the proper way to do floating point comparisons, in
> particular I need to check if a value is zero?

The first rule of floating-point comparisons is that you never use ==.
Well, not *literally* never (there are some cases where it's useful),
but you should never use == by default, and every time you do, you'd
better have a good reason for it.

As for why, see below.


> For example, given "real x = someCalculatingFunction();" how do I
> check if X is zero in a robust way.
> 
> if(x == 0.0) {} // <-- Will this work as expected?

Most likely, it will not. Unless you explicitly set x to 0.0 somewhere.
If x is the result of some complex computations, most likely it will not
be *exactly* 0.0. The correct way to compare floats is to write:

	if (abs(x - y) < Epsilon)
	{
		// x and y are approximately equal
	}

for some small-enough value of Epsilon.  Or, in your case:

	if (abs(x) < Epsilon)
	{
		// x is approximately zero
	}

So the first thing to know about floating-point is that it's only an
approximation, and because of that, (1) it is NOT the same as the real
numbers in mathematics, and (2) operations on floating-point values do
not always follow the same rules as mathematics.

For example:

	float a = 1.0 / 5.0;
	assert(a == 0.2);  // <--- this will FAIL

This is because 1/5 in binary has a non-terminating digit expansion
(much like 1/3 in decimal has a non-terminating digit expansion
0.3333...). Since we can only store a finite number of digits in a
float, the digits have to be truncated past a certain point, and that
introduces a slight round-off error. The round-off error introduced by
the division operation in 1.0 / 5.0 is slightly different from the
round-off error introduced by converting the literal 0.2 into binary, so
1.0 / 5.0 == 0.2 fails to hold.

Another gotcha is that (x+y)+z is not always the same as x+(y+z), unlike
in mathematics.  If x is a very large number relative to y, (x+y) could
be truncated to just x, so writing (x+y)+z becomes the same as writing
x+z; but if z is of intermediate magnitude, then (y+z) could be a value
different from z, and so x+(y+z) will produce a different answer than
(x+y)+z.


[...]
> PS: I have read this great article and the links it provides:
> http://dlang.org/d-floating-point.html
> 
> Most of it makes sense but I'm struggling to tie it all together
> when it comes time to apply it.

Two rules of thumb with floating-point values:

(1) Never use == unless you have a good reason for it (and if you don't
know what constitutes a good reason, you don't have one, so don't use
it). Instead, compare abs(a-b) with a small constant value,
conventionally named Epsilon, that represents "close enough" for your
purposes (and this will differ depending on what you're trying to do in
your program).

(2) Don't assume that floating-point operations behave the same way as
mathematical operators. For example, x*x - y*y and (x+y)*(x-y) are the
same thing in math, but in floating-point arithmetic, the former is
vulnerable to catastrophic cancellation (which may produce garbled
results for certain inputs), whereas the latter will give a reasonably
accurate answer for all inputs. When in doubt, consult well-researched
resources on floating-point arithmetic, such as:

	http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html


T

-- 
Truth, Sir, is a cow which will give [skeptics] no more milk, and so they are gone to milk the bull. -- Sam. Johnson


More information about the Digitalmars-d-learn mailing list