Safer casts

Janice Caron caron800 at googlemail.com
Sat May 10 03:19:37 PDT 2008


On 10/05/2008, Yigal Chripun <yigal100 at gmail.com> wrote:
>  suppose the bit layout of a is illegal utf-32 encoding. would you prefer
>  D allowed storing such an illegal value in a dchar?

I have to say yes.

It's a question of levels. Higher level code should never store
invalid UTF-32 in dchars nor dstrings. But lower level code must be
able to work with them.

For example, I wrote std.encoding. It has a function
isValidCodePoint() which takes a dchar and tells you whether or not it
contains a valid value. It also has a function, sanitize(), which
takes possibly invalid UTF as input, and emits guaranteed valid UTF as
output. It has a function, safeDecode(), which takes possibly invalid
UTF as input, removes the first UTF sequence, regardless of whether
valid or malformed, and returns either the decoded character or the
constant INVALID_SEQUENCE, which is (cast(dchar)(-1)).

The lowest level code has to be allowed, not merely /close/ to the
metal, but to actually turn the nuts and bolts. That's what it means
to be a systems programming language, and without that ability, no low
level code could ever be written, without resorting to assembler.

So yes,

    int n = anything;
    dchar c = cast!(dchar)n;

must always succeed. The exclamation mark means "I know what I'm
doing", which is exactly why it should be used with caution.


>  IMO, a strongly typed language (like D) must enforce at all times that
>  its variables are valid. I do not want D to allow storing illegal values
>  like that. that must be an error.

Consider this:

    string s = "\u20AC"; /* s contains exactly one Unicode character */
    string t = s[1..2];

Do you want to ban slicing?

Do you want slicing always to invoke a call to std.encoding.isValid(),
just to make sure the slice is valid? If so, you must see that
std.encoding itself needs to be allowed to do low-level stuff.

Higher level code is ultimately written in terms of lower level code,
so you can't ban the lower level code.

However, I would be more than happy with;

    int n;
    dchar c = cast(dchar)n; /* may throw */
    dchar d = cast!(dchar)n; /* always succeeds */



More information about the Digitalmars-d mailing list