The Case Against Autodecode

Sat May 28 03:40:02 PDT 2016

On 28-May-2016 01:04, tsbockman wrote:
> On Friday, 27 May 2016 at 20:42:13 UTC, Andrei Alexandrescu wrote:
>> On 05/27/2016 03:39 PM, Dmitry Olshansky wrote:
>>> No, this is not the point of normalization.
>>
>> What is? -- Andrei
>
> 1) A grapheme may include several combining characters (such as
> diacritics) whose order is not supposed to be semantically significant.
> Normalization sorts them in a standardized way so that string
> comparisons return the expected result for graphemes which differ only
> by the internal order of their constituent combining code points.
>
> 2) Some graphemes (like accented latin letters) can be represented by a
> single code point OR a letter followed by a combining diacritic.
> Normalization either splits them all apart (NFD), or combines them
> whenever possible (NFC). Again, this is primarily intended to make
> things like string comparisons work as expected, and perhaps to simplify
> low-level tasks like graphical rendering of text.

Quite accurate statement of the goals. Normalization is all about having 
canonical order of combining code points.

>
> (Disclaimer: This is an oversimplification, because nothing about
> Unicode is ever simple.)
>

-- 
Dmitry Olshansky