suggestion of type: ustring

Jesse Phillips jessekphillips+D at gmail.com
Sun Mar 20 09:12:49 PDT 2011


ZY Zhou Wrote:

> > It would be prohibitively expensive to be constantly validating strings.
> 
> No, it would be much much cheaper, since there are only 2 cases the validating is
> needed
> 
> 1) when you convert char[] to ustring, in this case, the validating is necessary
> 2) when you use split on ustring. but since ustring is guaranteed to be valid, the
> validating only need to check 2 bytes of data (start and end), much cheaper than
> validating the entire string.
> 
> after that, all the other functions will no longer need to worry about invalid
> utf8 string, as long as the parameter is ustring, no validating is needed.

Honestly, so far the only time I had problems processing utf has been when someone stuck a stupid BOM[1] at the beginning of the file.

Question, what is so hard about inserting validity checks[2] into your code just as you have described? This way you don't have to put them in contracts of all your functions.

1. http://en.wikipedia.org/wiki/Byte_order_mark
2. http://digitalmars.com/d/2.0/phobos/std_utf.html#validate


More information about the Digitalmars-d mailing list