null and type safety

Wed Nov 5 23:46:28 PST 2008

Walter Bright:
> For example, you should also be able to create a ranged int that can 
> only contain values from n to m:
> RangedInt!(N, M) i;

I presume you meant something like this:
Ranged!(int, N, M) i;

That has to work (assert and when possible statically assert) in situations like the following ones too:

i = long.max; // static err
i = ulong.max; // static err

Ranged!(ulong, 0, ulong.max) ul, r1, r2;
ul = ulong.max + 10; // static err
r1 = ulong.max / 2 + 100;
r2 = ulong.max / 2 + 100;
r1 + r2 // runtime err

As you have discussed about recently, for a compiler designer choosing what goes in the language and what to keep out of it and into libs is very important. Time ago I have read about a special Scheme *compiler* that is very small and very "pluggable", so it's very small, and a large part of the language is implemented by external libs, even a *static* type system, and several other things that usually are assumed part of the compiler itself.

It's not just a matter of creating a very flexible compiler core: even if you somehow are able to create it, then there are other problems to be solved: designing languages is quite hard, so you can't expect an average programmer to be a good designer like that (that's also why AST macros may lead to some troubles). So if you push things out of the language, you probably have to put them into standard libs, so normal programmers can use a standard and well designed version of them. Otherwise it leads to a lot of troubles that I don't list now.

How can we establish if ranged integral values have to be outside the compiler or inside? We can list requirements, etc. Generally the more things are pushed into the compiler, the more complex it becomes, slower to improve and mantain, and such features can also become more rigid (this can be seen very well with D unittests and ddoc. I think that eventually D unittests and ddoc may have to be removed to the language, and put into the standard library, and the language itself may have to grow some features (some more reflection? Maybe like Flectioned?) that allow the standard library code to define them with a handy & short syntax anyway. This to both reduce compiler complexity, allow more evolving capabilities to that functionalities, and allow the community of D programmers to improve them).

Some features of ranged integrals:
- Making integrals ranged has some different purposes, the main one is to avoid a class of runtime bugs, another purpose is to shorten some code a little. The final purpose is to have release code that has zero speed penalty compared to the D code of today. Some of those bugs are controlled by runtime code and other of them can probably be avoided at compile time, by the type system. The compiler can also avoid putting some runtime controls where it infers some values are into certain values. The code inside contracts (I mean of the contract programming) can be also used by the compiler to infer where it can remove more of those runtime controls.
- A short handy syntax is important enough, because for such ranges to become part of the D culture they may have to be handy, short, etc. If D has some features that most D programmers don't use, then they become less useful.
- Probably to avoid integral-related bugs the compiler and the runtime have to control all integral values used by the program, because letting the programmer use few of them in special points is probably a way to not see them used much. For the same purpose such controls probably need to be on (activated) by default, like array range controls.
- Recently I have shown a possible syntax to disable/enable some controls locally, with a syntax like:
safe(stack, ranges, arrays) {...}
unsafe(ranges) {...}
I think such syntax is better than the syntax used by ObjectPascal for such purposes.
- Once and where disabled such controls must have to cost zero at runtime, because D is designed to be quick. I presume the SafeD language has to keep them always activated (that's why having the compiler remove some of them automatically or using the contracts is useful).
- From the links I have shown here recently you can learn how much common is such class of integral-related bugs. And generally you can't talk about a "Safe-D" if you can't sum two integral values reliably :-)
- Range types of integers/chars/enums are useful, but also are useful subranges, that are essentially subtypes specialized for just this purpose. So if:
typedef int:1..6 TyDice;
typedef TyDice:1..3 TyHalfDice;
then a function that takes a TyDice automatically accepts a TyHalfDice too. Note that Haskell type system is so powerful that it allows the programmer to define such semantics, subtypes, etc. But D type system is quite more "primitive", so some of such things may be need to be cabled instead of being user-defined (by programmers that know a lot of type theory, of course).
- So I think that while unittests and ddoc may be better out of the compiler, range types may be better into it (note that I use unittests and ddoc _all the time_, all my programs use them heavily, I like them. I am not saying this because I don't like unittests and documentation strings).

Bye,
bearophile