Built-in unsafety in D

Fri Mar 12 05:46:39 PST 2010

This is a follow-up of this thread, and other older threads on this topic:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=104965

This is a nice article written in 2005 by Thomas Guest, "Built-in Type Safety?":
http://www.artima.com/cppsource/typesafetyP.html

It shows some bugs common in C++ code that I really really hope D will help avoid. It's 2010, so it's about time. (Note: the C# v.4 language give ways to avoid them all).

For me having a way to avoid most of those bugs is more important than:
- Having a good operator overload system;
- Having a way to break/ignore circular imports;
- Having actors;
- Having transitive immutability;
- Having true closures;
- Having good data structures in the standard library;
- Having efficient literal arrays;
- Having fast associative arrays, built-in or in a library;
- Having an efficient dynamic array append;
- Changing fixed sized arrays semantics to returning them by value;
- etc.

This is a compressed version of the function shown near the top of that article:

void signalUpdate(Signal update, Signal & stored) {
    int const tolerance = 10;
    if ((update > stored + tolerance) || (update < stored - tolerance)) {
        flashWriteSignal(update);
        stored = update;
    }
}

The bug was caused by:
typedef unsigned Signal;

Quoting the article:

So, the expression in signalsDifferent(10, 10, 20) evaluates: 10u > 10u + 20 || 10u < 10u - 20
Now, when you subtract an int from an unsigned  both values are promoted to unsigned and the result is unsigned. So, 10u - 20 is a very big unsigned number. Our expression is therefore equivalent to:
false||true which is of course true.

They originally have written very bad unittests, so that's part of the cause of their problem. But C++ too is flawed, this part of the design of C was maybe OK in 1970 but in 2010 is unacceptable. This is one of the few cases where breaking compatibility with C can be acceptable (and I think that breaking C compatibility for this purpose is more important for breaking it to improve the semantics of fixed sized arrays, as recently done).

CommonLisp has taught us that many functions in a program don't need max performance, so using efficient (usually not heap-allocated) multi-precision integers into them is not going to slow down a program significantly, but can avoid many integral values-related bugs. In Lisp the fixnums are usually a performance optimization you can use in selected performance-critical functions. In Lisp using fixnums everywhere in a program is (correctly) seen as premature optimization.

Even if D doesn't want to go the CLisp way, and wants to keep using C-style fixed-sized bit fields to represent integral values, I feel that having optional runtime overflow errors for integral values can help locate many of those bugs during the creation of a program (there can be two compilation switches, one to switch on those runtime errors only for signed integral values, and one to switch them on on both signed and unsigned integral values).

If you don't believe me, you can take a C# compiler, switch on the overflow errors, and then write a medium program, you will see your compiler+runtime happily catching several of your integral-related bugs.

Notes:
- In D more sane & stricter promotion rules from signed <-> unsigned values too will help, but they can't replace overflow errors.
- Adding a Sint (safe integral value) struct in the standard library is not a solution, because generally no one will use it.
- Avoiding the usage of unsigned values everywhere possible in the language and standard library too helps. I don't understand why the length attribute of arrays and the array indexes are unsigned in D (in C# they are signed, despite C# allows the user to use unsigned types), but so far I think it's a bad design choice that I'd like to change as soon as possible.

-----------------

The second problem shown in that article has a simpler and less disruptive solution: named arguments will be something useful to have in D. But this is a additive change, so I think there is no need to rush for this, it can wait.

-----------------

The third problem shown in that article was related to the usage of booleans to represent an input value for a function. Such usage of a boolean is indeed not clear at the calling point:

void textRender(std::string const & text,
                Rectangle const & region,
                bool wrap = false,
                bool bold = false,
                bool justify = false,
                int first_line_indent = 0);

textRender(text,
           full_screen, 
           true); // wrap text

In Python I have seen that named arguments help solve this problem a lot, because you use a name that lets you understand the purpose of the boolean.

In alternative another possible solution is shown in this Wish of the D Wish List, "Inline enum declaration":
http://all-technology.com/eigenpolls/dwishlist/index.php?it=76

That page contains:
void ShowWindow( enum{Show,Hide} sw ) { ... }
* self-documenting
* better than using "bool" (what's true/false?)
* no dummy types (otherwise enum showwindow_t {...})

On the surface it looks cute, but I don't like that solution a lot because it's a locally defined type, so you can't store it elsewhere, you can't store the argument of this function somewhere before giving such arguments to the function, etc.

So I think named arguments are enough to solve most of this third problem too. But named arguments can be added later, for example in D2.5 or D3. D2 contains enough bugs now, I think it's better to remove some of them before adding other _additive_ features. While changing the way integral values are managed is a breaking change, and it's not fit for D3.

Bye,
bearophile