The Case Against Autodecode

Tue May 31 09:57:53 PDT 2016

On Friday, May 27, 2016 23:16:58 David Nadlinger via Digitalmars-d wrote:
> On Friday, 27 May 2016 at 22:12:57 UTC, Minas Mina wrote:
> > Those should be the same though, i.e compare the same. In order
> > to do that, there is normalization. What is does is to _expand_
> > the single codepoint Ä into A + ¨
>
> Unless I'm mistaken, this depends on the form used. For example,
> in NFKC you'd get the single codepoint Ä.

Yeah. For better or worse, there are different normalization schemes for
Unicode. A normalization scheme makes the encodings consisent, but that
doesn't mean that each of the different normalization schemes does the same
thing, just that if you apply the same normalization scheme to two strings,
then all graphemes within those strings will be encoded identically.

- Jonathan M Davis