suggestion of type: ustring
Jesse Phillips
jessekphillips+D at gmail.com
Sun Mar 20 09:12:49 PDT 2011
ZY Zhou Wrote:
> > It would be prohibitively expensive to be constantly validating strings.
>
> No, it would be much much cheaper, since there are only 2 cases the validating is
> needed
>
> 1) when you convert char[] to ustring, in this case, the validating is necessary
> 2) when you use split on ustring. but since ustring is guaranteed to be valid, the
> validating only need to check 2 bytes of data (start and end), much cheaper than
> validating the entire string.
>
> after that, all the other functions will no longer need to worry about invalid
> utf8 string, as long as the parameter is ustring, no validating is needed.
Honestly, so far the only time I had problems processing utf has been when someone stuck a stupid BOM[1] at the beginning of the file.
Question, what is so hard about inserting validity checks[2] into your code just as you have described? This way you don't have to put them in contracts of all your functions.
1. http://en.wikipedia.org/wiki/Byte_order_mark
2. http://digitalmars.com/d/2.0/phobos/std_utf.html#validate
More information about the Digitalmars-d
mailing list