Unicode handling comparison

Wyatt wyatt.epp at gmail.com
Wed Nov 27 08:15:52 PST 2013


On Wednesday, 27 November 2013 at 14:45:32 UTC, David Nadlinger 
wrote:
>
> If you need to perform this kind of operations on Unicode 
> strings in D, you can call normalize (std.uni) on the string 
> first to make sure it is in one of the Normalization Forms. For 
> example, just appending .normalize to your strings (which 
> defaults to NFC) would make the code produce the "expected" 
> results.
>
Seems like a pretty big "gotcha" from a usability standpoint; 
it's not exactly intuitive.  I understand WHY this decision was 
made, but it feels like a source of code smell and weird string 
comparison errors.

> As far as I'm aware, this behavior is the result of a 
> deliberate decision, as normalizing strings on the fly isn't 
> really cheap.
>
I don't remember if it was brought up before, but this makes me 
wonder if something like an i18nString should exist for cases 
where it IS important.  Making i18n stuff as simple as it looks 
like it "should" be has merit, IMO.  (Maybe there's even room for 
a std.string.i18n submodule?)

-Wyatt


More information about the Digitalmars-d mailing list