std.string will get the boot
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Fri Jan 29 17:29:10 PST 2010
Ali Çehreli wrote:
> D is great that it supports three separate Unicode encodings in the
> language, but encodings are at a lower level of abstraction than
> "letters". I am not sure what data is used for toUniUpper and toUniLower
> in std.uni, but they can't work correctly without alphabet information.
> They favor the ASCII layout probabyl because for historical reasons.
>
> I think the problems with using the ASCII table for sorting is well
> known. A more interesting example is with the Azeri alphabet: it uses
> the ASCII xX characters, but sorts them after hH.
My idea of functions for upper/lowercase would help you solve exactly
the issue you mention. A conversion trie as an optional parameter would
allow to capitalize Straße as STRASSE and ali as ALİ.
The trie will match the longest substring of the original string and
will have translation strings in the nodes. The way capitalization is
done will depend on the way you set up the table.
Andrei
More information about the Digitalmars-d
mailing list