Unicode Normalization (and graphemes and locales)

Jack Stouffer via Digitalmars-d digitalmars-d at puremagic.com
Thu Jun 2 18:34:13 PDT 2016


On Friday, 3 June 2016 at 00:14:13 UTC, Walter Bright wrote:
> 5. Normalization, graphemes, and locales should all be 
> explicitly opt-in with corresponding library code.

Add decoding to that list and we're right there with you.

> 7. At some point, as the threads on autodecode amply 
> illustrate, working with level 2 or level 3 Unicode requires a 
> certain level of understanding on the part of the programmer 
> writing the code, because there simply is no overarching 
> correct way to do things. The programmer is going to have to 
> understand what he is trying to accomplish with Unicode and 
> select the code/algorithms accordingly.

Working at any level of Unicode in a systems programming language 
requires knowledge of Unicode. The thing is, because D is a 
systems language, we can't have the default behavior to decode to 
grapheme clusters, and because of that, we have to have 
everything be opt-in, because everything else is fundamentally 
wrong on some level. Once you step out of scripting language 
land, you can't get around requiring Unicode knowledge. Like I 
said in my blog,

> Unicode is hard. Trying to hide Unicode specifics helps
> no one because it's going to bite you in the ass eventually.


More information about the Digitalmars-d mailing list