The Case Against Autodecode
Minas Mina via Digitalmars-d
digitalmars-d at puremagic.com
Fri May 27 15:12:57 PDT 2016
On Friday, 27 May 2016 at 20:42:13 UTC, Andrei Alexandrescu wrote:
> On 05/27/2016 03:39 PM, Dmitry Olshansky wrote:
>> On 27-May-2016 21:11, Andrei Alexandrescu wrote:
>>> On 5/27/16 10:15 AM, Chris wrote:
>>>> It has happened to me that characters like "é" return length
>>>> == 2
>>>
>>> Would normalization make length 1? -- Andrei
>>
>> No, this is not the point of normalization.
>
> What is? -- Andrei
Here is an example about normalization.
In Unicode, the grapheme Ä is composed of two code points: A (the
ascii A) and the ¨ character.
However, one of the goals of unicode was to be backwards to
compatible with earlier encodings that extended ASCII (codepages).
In some codepages, Ä was an actual codepoint.
So in some cases you would have the unicode one which is two
codepoints and the one from some codepages which would be one.
Those should be the same though, i.e compare the same. In order
to do that, there is normalization. What is does is to _expand_
the single codepoint Ä into A + ¨
More information about the Digitalmars-d
mailing list