Unicode Normalization (and graphemes and locales)
Walter Bright via Digitalmars-d
digitalmars-d at puremagic.com
Thu Jun 2 17:14:13 PDT 2016
On 6/2/2016 4:29 PM, Jonathan M Davis via Digitalmars-d wrote:
> How do you suggest that we handle the normalization issue? Should we just
> assume NFC like std.uni.normalize does and provide an optional template
> argument to indicate a different normalization (like normalize does)? Since
> without providing a way to deal with the normalization, we're not actually
> making the code fully correct, just faster.
The short answer is, we don't.
1. D is a systems programming language. Baking normalization, graphemes and
Unicode locales in at a low level will have a disastrous negative effect on
performance and size.
2. Very little systems programming work requires level 2 or 3 Unicode support.
3. Are they needed? Pedantically, yes. Practically, not necessarily.
4. What we must do is, for each algorithm, document how it handles Unicode.
5. Normalization, graphemes, and locales should all be explicitly opt-in with
corresponding library code.
Normalization: s.normalize.algorithm()
Graphemes: may require separate algorithms, maybe std.grapheme?
Locales: I have no idea, given that I have not studied that issue
6. std.string has many analogues for std.algorithms that are specific to the
peculiarities of strings. I think this is a perfectly acceptable approach. For
example, there are many ways to sort Unicode strings, and many of them do not
fit in with std.algorithm.sort's ways. Having special std.string.sort's for them
would be the most practical solution.
7. At some point, as the threads on autodecode amply illustrate, working with
level 2 or level 3 Unicode requires a certain level of understanding on the part
of the programmer writing the code, because there simply is no overarching
correct way to do things. The programmer is going to have to understand what he
is trying to accomplish with Unicode and select the code/algorithms accordingly.
More information about the Digitalmars-d
mailing list