Sorting floating-point values, and NaN

Tue Nov 12 08:00:45 PST 2013

On Tuesday, 12 November 2013 at 08:10:16 UTC, Jonathan M Davis 
wrote:
> On Tuesday, November 12, 2013 05:41:55 Vladimir Panteleev wrote:
>> double[] arr = [1, 2, 3, double.nan, 1, 2];
>> assert(arr.isSorted);
>> arr.sort();
>> 
>> This code will fail with an assert error, but not the one on 
>> the
>> second line. Rather, it will fail inside std.range, when sort
>> calls assumeSorted, which checks elements in an order different
>> than sort and isSorted.
>> 
>> Here's a case where the odd way NaN interacts with generic code
>> messes things up in an ugly way.
>> 
>> This is concerning. It's very easy to overlook this while 
>> writing
>> an application, and it can become a hidden vulnerability.
>> 
>> We can't fix the operators... but, perhaps, we could define a
>> safe default comparison predicate (e.g. std.algorithm.less) for
>> use with sorting and related tasks? Aside from bitwise
>> comparison, is it even possible to efficiently compare
>> floating-point values in a way suitable for sorting?
>
> The check could be special-cased for floating point values so 
> that it takes NaN
> into account, but NaN pretty much screws with everything. Once 
> it's there,
> pretty much anything involving math or comparisons is going to 
> go bad. So, you
> could argue that it's a bug in the program if sort is passed a 
> NaN and that
> the assertion in sort is therefore perfectly valid. You can't 
> really sort
> anything with NaN in it anyway.

Here is the problem:

1) NaN values are accepted as input almost anywhere where doubles 
are. std.conv recognizes the string "nan" and converts it to 
double.nan,
2) It is unreasonable to expect every D user out there dealing 
with floating point numbers to be forced to check every source of 
floating point values in their program against NaNs.
3) If the program is compiled in debug mode, it will crash 
quasi-randomly, since the assumeSorted assert does not check 
every pair.

For these reasons, this problem can become a sort of time bomb in 
their application: even with 100% test coverage, one NaN in the 
wrong place inputted by a malicious end user can cause a 
situation the program's authors have not foreseen.

It can be argued that it is a flaw in the language in that it 
allowed such a situation to arise in the first place.

P.S. Please enable RFC 2646 (format=flowed) support in your email 
client. It is emitting unreflowable text, which causes jagged 
quotes as seen above.