Inconsitency

qznc qznc at web.de
Wed Oct 16 05:33:00 PDT 2013


On Wednesday, 16 October 2013 at 12:18:40 UTC, Jacob Carlborg 
wrote:
> On 2013-10-16 10:03, qznc wrote:
>
>> Most code might be buggy then.
>>
>> An issue the often comes up is file names. A file called "bär" 
>> will be
>> normalized differently depending on the operating system. In 
>> both cases
>> it is one grapheme. However, on Linux it is one code point, 
>> but on OS X
>> it is two code points.
>
> Why would it require two code points?

It is either [U+00E4] as one code point or [a,U+0308] for two 
code points. The second is "combining diaeresis" [0]. Not 
required, but possible. Those combining characters [1] provide a 
nearly infinite number of combinations. You can go crazy with it: 
http://stackoverflow.com/questions/6579844/how-does-zalgo-text-work

[0] http://www.fileformat.info/info/unicode/char/0308/index.htm
[1] http://en.wikipedia.org/wiki/Combining_character


More information about the Digitalmars-d mailing list