Unicode Normalization (and graphemes and locales)
Jack Stouffer via Digitalmars-d
digitalmars-d at puremagic.com
Thu Jun 2 18:34:13 PDT 2016
On Friday, 3 June 2016 at 00:14:13 UTC, Walter Bright wrote:
> 5. Normalization, graphemes, and locales should all be
> explicitly opt-in with corresponding library code.
Add decoding to that list and we're right there with you.
> 7. At some point, as the threads on autodecode amply
> illustrate, working with level 2 or level 3 Unicode requires a
> certain level of understanding on the part of the programmer
> writing the code, because there simply is no overarching
> correct way to do things. The programmer is going to have to
> understand what he is trying to accomplish with Unicode and
> select the code/algorithms accordingly.
Working at any level of Unicode in a systems programming language
requires knowledge of Unicode. The thing is, because D is a
systems language, we can't have the default behavior to decode to
grapheme clusters, and because of that, we have to have
everything be opt-in, because everything else is fundamentally
wrong on some level. Once you step out of scripting language
land, you can't get around requiring Unicode knowledge. Like I
said in my blog,
> Unicode is hard. Trying to hide Unicode specifics helps
> no one because it's going to bite you in the ass eventually.
More information about the Digitalmars-d
mailing list