Sorting floating-point values, and NaN

Tue Nov 12 12:03:36 PST 2013

On 11/12/13 9:59 AM, Vladimir Panteleev wrote:
> On Tuesday, 12 November 2013 at 16:48:01 UTC, Andrei Alexandrescu wrote:
>> On 11/12/13 8:00 AM, Vladimir Panteleev wrote:
>>> 1) NaN values are accepted as input almost anywhere where doubles are.
>>> std.conv recognizes the string "nan" and converts it to double.nan,
>>
>> They are still singular values. Code has no business comparing them
>> unwittingly.
>>
>>> 2) It is unreasonable to expect every D user out there dealing with
>>> floating point numbers to be forced to check every source of floating
>>> point values in their program against NaNs.
>>
>> I find that perfectly reasonable. How do you mean that?
>
> Both of the above hinge on the relative obscurity of NaNs and the
> problems they could cause. People who are not specifically familiar with
> NaNs and how they interact with other floating-point values will treat
> floating-point values as "just numbers", and expect them to compare and
> sort in the same way as integers.

I think that's their problem, not ours.

>> The language allows NaN floating point number with the semantics of
>> IEEE 754. We hardly have any leeway in the matter unless we want to
>> irate a lot of people for no good reason.
>
>> It's a judgment call, not a cut and dried decision.
>
> Agreed - however, I think there might be more than one way to deal with
> this.
>
> 1) As above, introduce a "less" function which guarantees transitivity
> for basic types, and use it in examples throughout.

This is a bit much.

> 2) Improve documentation and visibility of this problem. For example,
> add this to the documentation of sort:
>
> "The predicate must be transitive (`a<b && b<c` implies `a<c`) in order
> for `sort` to behave properly - otherwise, the program may fail on
> certain inputs (but not others) when not compiled in release mode. Note
> that `a<b` is not transitive for floating points, because the expression
> will always be `false` when either `a` or `b` is `NaN`."

Great idea, just file an enh request or just write the pull request.

>> We should simply add an assert to isSorted.
>
> How would this be implemented, anyway? I stumbled upon this problem with
> this code:
>
> enum sortPred = q{a.hits > b.hits || (a.hits == b.hits && a.value <
> b.value)};
> all.sort!sortPred();
>
> (I know about multiSort, but it currently seems to be buggy - I do have
> a test case, but it has a few million entries and I need to reduce it.)
>
>> if (a[i] < a[i + 1]) { assert(!(a[i + 1] < a[i])); ... }
>> else { assert(a[i + 1] >= a[i]); ... }
>
> Sort et al don't know how to check "a>=b"... and they can't use "a<b"
> either, because !(a<b) && !(b<a) implies a==b.

It really implies "a and b are in the same equivalence class". Building 
on that notion, we can figure what's wrong with NaNs and how to assert 
against their failings.

Consider we define equiv(a, b) as (a, b) => !less(a, b) && !less(b, a).

If at least one of the numbers is NaN, all comparisons return false. 
That puts NaN in the same equivalence class with all numbers, including 
numbers that are deemed in distinct equivalence classes. That's where 
NaNs mess things up, and that's what we need to assert. Consider three 
numbers a, b, and c. Then the following must pass:

assert((equiv(a, b) && equiv(b, c)) <= equiv(a, c));

("<=" on Booleans is actually implication.)

That test will fail with NaNs and should be part of isSorted, sort etc.

We should also check such as:

assert(!less(a, a));
assert(less(a, b) <= !less(b, a));

These should be checked in sort only; isSorted does not need these.

Andrei