string need to be robust

spir denis.spir at gmail.com
Sun Mar 13 05:50:17 PDT 2011


On 03/13/2011 01:25 PM, ZY Zhou wrote:
>> but I think that it's completely unreasonable to expect
>> >  all of the string-based and/or range-based functions to be able to handle
>> >  invalid unicode.
> As I explained in the first mail, if utf8 parser convert all invalid utf8 chars to
> low surrogate code points(0x80~0xFF ->
> 0xDC80~0xDCFF), other string related functions will still work fine, and you can
> also handle these error if you want
>
> string s = "\xa0";
> foreach(dchar d; s) {
>    if (isValidUnicode(d)) {
>      process(d);
>    } else {
>      handleError(d);
>    }
> }

This is not a good idea, imo. Surrogate values /are/ invalid code points. (For 
the ones who guess, there are a range of /code unit/ values used to code in 
utf16 code points > 0xFFFF.) They should never appear in a string of dchar[]; 
and a string of char[] code units should never encode a non-code point in the 
surrogate range.

Denis
-- 
_________________
vita es estrany
spir.wikidot.com



More information about the Digitalmars-d mailing list