The Case Against Autodecode
Vladimir Panteleev via Digitalmars-d
digitalmars-d at puremagic.com
Fri Jun 3 03:10:18 PDT 2016
On Friday, 3 June 2016 at 10:05:11 UTC, Walter Bright wrote:
> On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote:
>> However, this
>> meant that some precomposed characters were "redundant": they
>> represented character + diacritic combinations that could
>> equally well
>> be expressed separately. Normalization was the inevitable
>> consequence.
>
> It is not inevitable. Simply disallow the 2 codepoint sequences
> - the single one has to be used instead.
>
> There is precedent. Some characters can be encoded with more
> than one UTF-8 sequence, and the longer sequences were declared
> invalid. Simple.
>
> I.e. have the normalization up front when the text is created
> rather than everywhere else.
I don't think it would work (or at least, the analogy doesn't
hold). It would mean that you can't add new precomposited
characters, because that means that previously valid sequences
are now invalid.
More information about the Digitalmars-d
mailing list