string need to be robust

spir denis.spir at gmail.com
Sun Mar 13 13:21:52 PDT 2011


On 03/13/2011 04:43 PM, ZY Zhou wrote:
> If a invalid utf8 or utf16 code need to be converted to utf32, then it should be
> converted to an invalid utf32. that's why D800~DFFF are marked as invalid points
> in unicode standard.

You are wrong on both points.
First, there is no definition of invalid source conversion into another 
format/encoding; instead it should be treated as invalid, that's all. A 
language or string-processing library should certainly *not* provide any way to 
do that. Instead, it should just signal invalidity by crashing or throwing.
Second, the range you mention is not intended for application use; instead it 
is reserved for special use by utf16; and, as such, invalid.

Since the beginning of this thread, you are demanding for D standard features 
(the *string types or *char[] arrays) to cope with your particular needs of the 
moment, doing your job; at the price of all other use cases of those features 
potentially becoming unsecure or incorrect; crashing loads of existing code 
which rely on correct behaviour; and breaking the standard.
Strange.

Denis

> == Quote from spir (denis.spir at gmail.com)'s article
>> This is not a good idea, imo. Surrogate values /are/ invalid code points. (For
>> the ones who guess, there are a range of /code unit/ values used to code in
>> utf16 code points>  0xFFFF.) They should never appear in a string of dchar[];
>> and a string of char[] code units should never encode a non-code point in the
>
>

-- 
_________________
vita es estrany
spir.wikidot.com



More information about the Digitalmars-d mailing list