suggestion of type: ustring
Jonathan M Davis
jmdavisProg at gmx.com
Sun Mar 20 06:14:03 PDT 2011
> D's string is supposed to be utf8 encoded, however, the following code
> compiles and runs with no error:
>
> string s = "\xff"; // s is invalid
> writeln(s);
> fileStream.writeLine(s);
>
> In order to make sure only valid utf8 string is used in the system,
> validating is needed everywhere, e.g.
>
> string cut3bytes(string s)
> in {validate(s);}
> out(result} {validate(result);}
> body {return s.length > 3 ? s[0..3] : s;}
>
> I think it will be better if D has a ustring type to do all the validating
> job. e.g.
>
> ustring s = "0xFF"; // compile error
>
> char[] c = [0xFF];
> ustring s = c.idup; // throw UtfException
>
> ustring s1 = "\xc2\xa2";
> ustring s2 = s1[0..1]; // throw UtfException
>
> So the above example can be simplified to:
>
> ustring cut3bytes(ustring s)
> {return s.length > 3 ? s[0..3] : s;}
It would be prohibitively expensive to be constantly validating strings. You
validate them at the point that they're created, and then you generally don't
worry about. Doing otherwise would be expensive. Some functions do check that
a string is properly encoded, but most don't. If you want a string type that
actually validates on every operation, feel free to define a struct which
holds a string internally and has all of the appropriate overloaded operators
so that it's a range of dchar and whatnot. But you're going to have a hard
time convincing folks that such a type should be in Phobos, and there's no way
that it would make it into the language itself.
And honestly, how often do you have to worry about invalid strings? As long as
you check them when they're created, you won't generally have problems with
invalid strings, and it's a lot less expensive than constantly checking their
validity whenever you do anything with them.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list