string need to be robust

spir denis.spir at gmail.com
Sun Mar 13 05:55:06 PDT 2011


On 03/13/2011 01:25 PM, ZY Zhou wrote:
>> but I think that it's completely unreasonable to expect
>> >  all of the string-based and/or range-based functions to be able to handle
>> >  invalid unicode.
> As I explained in the first mail, if utf8 parser convert all invalid utf8 chars to
> low surrogate code points(0x80~0xFF ->
> 0xDC80~0xDCFF), other string related functions will still work fine, and you can
> also handle these error if you want
>
> string s = "\xa0";
> foreach(dchar d; s) {
>    if (isValidUnicode(d)) {
>      process(d);
>    } else {
>      handleError(d);
>    }
> }

PS: You are free to preprocess the source if you like it, and convert invalid 
parts into whatever you like. But instead of surrogates, you'd rather use one 
of the freely usable ranges of values; or use 0 maybe (so that output won't be 
disturbed); or better the code point intended for "un-representable" thingie, 
that all fonts would correctly interpret (and usually display as an inverse 
video '?').

Denis
-- 
_________________
vita es estrany
spir.wikidot.com



More information about the Digitalmars-d mailing list