Safer casts
Janice Caron
caron800 at googlemail.com
Sat May 10 03:19:37 PDT 2008
On 10/05/2008, Yigal Chripun <yigal100 at gmail.com> wrote:
> suppose the bit layout of a is illegal utf-32 encoding. would you prefer
> D allowed storing such an illegal value in a dchar?
I have to say yes.
It's a question of levels. Higher level code should never store
invalid UTF-32 in dchars nor dstrings. But lower level code must be
able to work with them.
For example, I wrote std.encoding. It has a function
isValidCodePoint() which takes a dchar and tells you whether or not it
contains a valid value. It also has a function, sanitize(), which
takes possibly invalid UTF as input, and emits guaranteed valid UTF as
output. It has a function, safeDecode(), which takes possibly invalid
UTF as input, removes the first UTF sequence, regardless of whether
valid or malformed, and returns either the decoded character or the
constant INVALID_SEQUENCE, which is (cast(dchar)(-1)).
The lowest level code has to be allowed, not merely /close/ to the
metal, but to actually turn the nuts and bolts. That's what it means
to be a systems programming language, and without that ability, no low
level code could ever be written, without resorting to assembler.
So yes,
int n = anything;
dchar c = cast!(dchar)n;
must always succeed. The exclamation mark means "I know what I'm
doing", which is exactly why it should be used with caution.
> IMO, a strongly typed language (like D) must enforce at all times that
> its variables are valid. I do not want D to allow storing illegal values
> like that. that must be an error.
Consider this:
string s = "\u20AC"; /* s contains exactly one Unicode character */
string t = s[1..2];
Do you want to ban slicing?
Do you want slicing always to invoke a call to std.encoding.isValid(),
just to make sure the slice is valid? If so, you must see that
std.encoding itself needs to be allowed to do low-level stuff.
Higher level code is ultimately written in terms of lower level code,
so you can't ban the lower level code.
However, I would be more than happy with;
int n;
dchar c = cast(dchar)n; /* may throw */
dchar d = cast!(dchar)n; /* always succeeds */
More information about the Digitalmars-d
mailing list