Unicode's proper level of abstraction? [was: Re: VLERange:...]

Thu Jan 13 05:47:58 PST 2011

On 2011-01-13 06:48:46 -0500, spir <denis.spir at gmail.com> said:

> Note that D's stdlib currently provides no means to do this, not even 
> on the fly. You'd have to interface with eg ICU (a C/C++/Java Unicode 
> library) (good luck ;-). But even ICU, as well as supposed 
> unicode-aware typse or librarys for any language, would give you an 
> abstraction producing correct results for Michel's example. For 
> instance, Python3 code fails as miserably as any other. AFAIK, D is the 
> first and only language having such a tool (Text.d at 
> https://bitbucket.org/denispir/denispir-d/src/a005424f60f3).

D is not the first language dealing correctly with Unicode strings in 
this manner. Objective-C's NSString class search and compare methods 
deal with characters with combining marks correctly. If you want to 
compare code points, you can do so explicitly using the NSLiteralSearch 
option, but the default is to compare the canonical version (at the 
grapheme level).
<http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Strings/Articles/SearchingStrings.html%23//apple_ref/doc/uid/20000149-CJBBGBAI>

In 

Cocoa, string sorting and case-insensitive comparition is also 
dependent on the user's locale settings, although you can also specify 
your own locale if the user's locale is not what you want.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/