The Case Against Autodecode

Fri May 13 03:49:24 PDT 2016

On Thursday, 12 May 2016 at 23:16:23 UTC, H. S. Teoh wrote:
> Therefore, autodecoding actually only produces intuitively 
> correct results when your string has a 1-to-1 correspondence 
> between grapheme and code point. In general, this is only true 
> for a small subset of languages, mainly a few common European 
> languages and a handful of others.  It doesn't work for Korean, 
> and doesn't work for any language that uses combining 
> diacritics or other modifiers.  You need byGrapheme to have the 
> correct results.

In fact, even most European languages are affected if NFD 
normalization is used, which is the default on MacOS X.

And this is actually the main problem with it: It was introduced 
to make unicode string handling correct. Well, it doesn't, 
therefore it has no justification.