[Issue 3843] New: Signed lengths (and other built-in values)

d-bugmail at puremagic.com d-bugmail at puremagic.com
Tue Feb 23 16:58:41 PST 2010


http://d.puremagic.com/issues/show_bug.cgi?id=3843

           Summary: Signed lengths (and other built-in values)
           Product: D
           Version: 2.040
          Platform: All
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: nobody at puremagic.com
        ReportedBy: bearophile_hugs at eml.cc


--- Comment #0 from bearophile_hugs at eml.cc 2010-02-23 16:58:38 PST ---
This is Java code coming from Wikipedia:
http://en.wikipedia.org/wiki/Selection_Sort#Code


void selectionSort(int[] a) 
{
    for (int i = 0; i < a.length - 1; i++)
    {
        int min = i;
        for (int j = i + 1; j < a.length; j++)
        {
            if (a[j] < a[min])
            {
                min = j;
            }
        }
        if (i != min) 
        {
            int swap = a[i];
            a[i] = a[min];
            a[min] = swap;
        }
    }
}


You can translate it in correct Python in a short time (I have added a
postcondition too, plus doctests):

def selection_sort(arr):
    """
    >>> items = []
    >>> selection_sort(items)
    >>> items
    []
    >>> items = [5, 3, 1]
    >>> selection_sort(items)
    >>> items
    [1, 3, 5]
    """
    for i in xrange(0, len(arr)-1):
        min = i
        for j in xrange(i + 1, len(arr)):
            if arr[j] < arr[min]:
                min = j
        if i != min:
            arr[i], arr[min] = arr[min], arr[i]

    if __debug__:
        # postcondition that it's not increasing
        for i, el in enumerate(arr[:-1]):
            assert el <= arr[i + 1], "'arr' not correctly sorted."

if __name__ == "__main__":
    import doctest
    doctest.testmod()
    print "Doctests done.\n"


This is a possible translation in D2 with the postcondition:

import std.stdio: writeln;

void selectionSort(T)(T[] arr)
out {
    // assert that it's not increasing
    foreach (i, el; arr[0 .. $-1])
        assert (el <= arr[i + 1],
                "selectionSort: 'arr' not correctly sorted.");
}
body {
    for (int i = 0; i < arr.length - 1; i++) {
        T min = i;
        for (int j = i + 1; j < arr.length; j++)
            if (arr[j] < arr[min])
                min = j;
        if (i != min)  {
            T aux = arr[i];
            arr[i] = arr[min];
            arr[min] = aux;
        }
    }
}

void main() {
    int[] items;
    writeln(items);
    selectionSort(items);
    writeln(items);
}


But unlike the Python and Java code, that D2 code contains two bugs (or more),
that Python, Java and C# programmers may need some time to spot.

The first bug is in this line:
for (int i = 0; i < arr.length - 1; i++) {

This bug is caused by the combination of the following factors:
- arr.length is unsigned, so 0-1 is not -1, it's not less than 0.
- Currently with DMD there's no way to ask the code to perform integral
overflow tests at runtime, so it can't catch it and give a nice error message
at runtime.
- The compiler isn't even telling me that i < arr.length performs a comparison
between a signed and unsigned number, that can hide a bug (I think GCC can
raise a warning there).

I can show other examples of D code that contain similar bugs caused by mixing
signed and unsigned values.

Unsigned integers are useful when:
- You need a bit array, for example to implement a bloom filter, a bitvector, a
bit set, when you want to do SWAR, when you need bit arrays to deal with
hardware, etc.
- When you really need the full range of numbers 0 .. 2^n, this happens but
it's uncommon for the numbers of 32 or more bits (it's more common for numbers
less than 32 bits long).

In most other situations using unsigned numbers is unsafe (because D can
silently cast signed values to unsigned ones, and because it doesn't perform
run time overflow tests) and it's better to use signed values. So for example
array indices and lengths are (I think) better to be signed (ptrdiff_t), as
almost everything else.

If you mix signed and unsigned arithmetic to index an array or to measure its
length you will often introduce bugs in the code.

Note 1: C# has unsigned numbers, but I think its array lengths are signed.

Note 2: on 64 bit operating systems arrays can be longer than 2^31, and even
now an int is not able to store the full range of a 32 bit ptrdiff_t. So code
like:
int len = arr.length;
Is unsafe now and will be even more unsafe on 64 bit systems where arrays can
and will be longer. It's better to invent ways to avoid such kind bugs as much
as possible.

Note 3: The second bug in that D2 selectionSort is in this line:
foreach (i, el; arr[0 .. $-1])
If arr is empty, that produces an out of bound error in nonrelease mode (a
runtime error is better than nothing).
In Python a slice of an empty array is empty (this is not a design overlook or
mistake, and it's something Python designers wanted).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------


More information about the Digitalmars-d-bugs mailing list