Unicode Normalization (and graphemes and locales)

Walter Bright via Digitalmars-d digitalmars-d at puremagic.com
Thu Jun 2 17:14:13 PDT 2016


On 6/2/2016 4:29 PM, Jonathan M Davis via Digitalmars-d wrote:
 > How do you suggest that we handle the normalization issue? Should we just
 > assume NFC like std.uni.normalize does and provide an optional template
 > argument to indicate a different normalization (like normalize does)? Since
 > without providing a way to deal with the normalization, we're not actually
 > making the code fully correct, just faster.

The short answer is, we don't.

1. D is a systems programming language. Baking normalization, graphemes and 
Unicode locales in at a low level will have a disastrous negative effect on 
performance and size.

2. Very little systems programming work requires level 2 or 3 Unicode support.

3. Are they needed? Pedantically, yes. Practically, not necessarily.

4. What we must do is, for each algorithm, document how it handles Unicode.

5. Normalization, graphemes, and locales should all be explicitly opt-in with 
corresponding library code.

Normalization: s.normalize.algorithm()
Graphemes: may require separate algorithms, maybe std.grapheme?
Locales: I have no idea, given that I have not studied that issue

6. std.string has many analogues for std.algorithms that are specific to the 
peculiarities of strings. I think this is a perfectly acceptable approach. For 
example, there are many ways to sort Unicode strings, and many of them do not 
fit in with std.algorithm.sort's ways. Having special std.string.sort's for them 
would be the most practical solution.

7. At some point, as the threads on autodecode amply illustrate, working with 
level 2 or level 3 Unicode requires a certain level of understanding on the part 
of the programmer writing the code, because there simply is no overarching 
correct way to do things. The programmer is going to have to understand what he 
is trying to accomplish with Unicode and select the code/algorithms accordingly.


More information about the Digitalmars-d mailing list