The Case Against Autodecode
Walter Bright via Digitalmars-d
digitalmars-d at puremagic.com
Fri Jun 3 03:05:11 PDT 2016
On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote:
> However, this
> meant that some precomposed characters were "redundant": they
> represented character + diacritic combinations that could equally well
> be expressed separately. Normalization was the inevitable consequence.
It is not inevitable. Simply disallow the 2 codepoint sequences - the single one
has to be used instead.
There is precedent. Some characters can be encoded with more than one UTF-8
sequence, and the longer sequences were declared invalid. Simple.
I.e. have the normalization up front when the text is created rather than
everywhere else.
More information about the Digitalmars-d
mailing list