std.string will get the boot

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Fri Jan 29 17:29:10 PST 2010


Ali Çehreli wrote:
> D is great that it supports three separate Unicode encodings in the 
> language, but encodings are at a lower level of abstraction than 
> "letters". I am not sure what data is used for toUniUpper and toUniLower 
> in std.uni, but they can't work correctly without alphabet information. 
> They favor the ASCII layout probabyl because for historical reasons.
> 
> I think the problems with using the ASCII table for sorting is well 
> known. A more interesting example is with the Azeri alphabet: it uses 
> the ASCII xX characters, but sorts them after hH.

My idea of functions for upper/lowercase would help you solve exactly 
the issue you mention. A conversion trie as an optional parameter would 
allow to capitalize Straße as STRASSE and ali as ALİ.

The trie will match the longest substring of the original string and 
will have translation strings in the nodes. The way capitalization is 
done will depend on the way you set up the table.


Andrei



More information about the Digitalmars-d mailing list