suggestion of type: ustring

ZY Zhou rinick at GgMmAaIiLl.com
Sun Mar 20 08:19:25 PDT 2011


> It would be prohibitively expensive to be constantly validating strings.

No, it would be much much cheaper, since there are only 2 cases the validating is
needed

1) when you convert char[] to ustring, in this case, the validating is necessary
2) when you use split on ustring. but since ustring is guaranteed to be valid, the
validating only need to check 2 bytes of data (start and end), much cheaper than
validating the entire string.

after that, all the other functions will no longer need to worry about invalid
utf8 string, as long as the parameter is ustring, no validating is needed.

== Quote from Jonathan M Davis (jmdavisProg at gmx.com)'s article
> > D's string is supposed to be utf8 encoded, however, the following code
> > compiles and runs with no error:
> >   string s = "\xff"; // s is invalid
> >   writeln(s);
> >   fileStream.writeLine(s);
> > In order to make sure only valid utf8 string is used in the system,
> > validating is needed everywhere, e.g.
> >   string cut3bytes(string s)
> >   in {validate(s);}
> >   out(result} {validate(result);}
> >   body {return s.length > 3 ? s[0..3] : s;}
> > I think it will be better if D has a ustring type to do all the validating
> > job. e.g.
> >   ustring s = "0xFF";  // compile error
> >   char[] c = [0xFF];
> >   ustring s = c.idup;  // throw UtfException
> >   ustring s1 = "\xc2\xa2";
> >   ustring s2 = s1[0..1];  // throw UtfException
> > So the above example can be simplified to:
> >   ustring cut3bytes(ustring s)
> >   {return s.length > 3 ? s[0..3] : s;}
> It would be prohibitively expensive to be constantly validating strings.


More information about the Digitalmars-d mailing list