Sorting floating-point values, and NaN

Tue Nov 12 01:33:26 PST 2013

On Tuesday, 12 November 2013 at 08:54:35 UTC, tn wrote:
> On Tuesday, 12 November 2013 at 07:33:57 UTC, qznc wrote:
>> On Tuesday, 12 November 2013 at 04:41:56 UTC, Vladimir 
>> Panteleev wrote:
>>> double[] arr = [1, 2, 3, double.nan, 1, 2];
>>> assert(arr.isSorted);
>>> arr.sort();
>>>
>>> This code will fail with an assert error, but not the one on 
>>> the second line. Rather, it will fail inside std.range, when 
>>> sort calls assumeSorted, which checks elements in an order 
>>> different than sort and isSorted.
>>>
>>> Here's a case where the odd way NaN interacts with generic 
>>> code messes things up in an ugly way.
>>>
>>> This is concerning. It's very easy to overlook this while 
>>> writing an application, and it can become a hidden 
>>> vulnerability.
>>>
>>> We can't fix the operators... but, perhaps, we could define a 
>>> safe default comparison predicate (e.g. std.algorithm.less) 
>>> for use with sorting and related tasks? Aside from bitwise 
>>> comparison, is it even possible to efficiently compare 
>>> floating-point values in a way suitable for sorting?
>>
>> You cannot sort NaNs. A comparison with NaN must always 
>> evaluate to false.
>>
>> One could change isSorted to check for false instead of true, 
>> so it would accept NaN. That would probably break too much 
>> code, though.
>>
>>  arr[n] < arr[n+1]  => !(arr[n+1] < arr[n])
>
> But that is exactly what it does.
>
> https://github.com/D-Programming-Language/phobos/blob/8372da44f66f20c67b69b4b8fb39233ce42d493c/std/algorithm.d#L10021
>
> And that is the reason the second line "assert(arr.isSorted);" 
> passes, while it should fail as the array is clearly not 
> sorted. Instead modifying
>
>   !(arr[n+1] < arr[n])  =>  arr[n] <= arr[n+1]
>
> would make the function work correctly (return false if the 
> range contains nans), but then the template parameter "less" 
> should be replaced by "lessOrEqual". An alternative would be to 
> check for nans explicitly.

Actually, you could also use "!>=" instead of "<" so that the 
compasison would become:

   !(arr[n+1] !>= arr[n])

This is also correct for floating point numbers. However, it 
might be problematic for user defined types that also implement 
something similar to nan (eg. bigfloat). I could not find any 
documentation on how the unordered comparison operators (<>, 
!<>=, !<=, !<, !>=, !>, !<>) translate into opCmp calls.