Sorting with non-ASCII characters

Chris wendlec at tcd.ie
Tue Sep 24 04:00:23 PDT 2013


On Tuesday, 24 September 2013 at 10:35:53 UTC, Jos van Uden wrote:
> On 24-9-2013 11:26, Chris wrote:
>> On Thursday, 19 September 2013 at 18:44:54 UTC, Jos van Uden 
>> wrote:
>>> On 19-9-2013 17:18, Chris wrote:
>>>> Short question in case anyone knows the answer straight away:
>>>>
>>>> How do I sort text so that non-ascii characters like "á" are 
>>>> treated in the same way as "a"?
>>>>
>>>> Now I'm getting this:
>>>>
>>>> [wow, ara, ába, marca]
>>>>
>>>> ===> sort(listAbove);
>>>>
>>>> [ara, marca, wow, ába]
>>>>
>>>> I'd like to get:
>>>>
>>>> [ ába, ara, marca, wow]
>>>
>>> If you only need to process extended ascii, then you could 
>>> perhaps
>>> make do with a transliterated sort, something like:
>>>
>>> import std.stdio, std.string, std.algorithm, std.uni;
>>>
>>> void main() {
>>>    auto sa = ["wow", "ara", "ába", "Marca"];
>>>    writeln(sa);
>>>    trSort(sa);
>>>    writeln(sa);
>>> }
>>>
>>> void trSort(C, alias less = "a < b")(C[] arr) {
>>>    static dstring c1 = "àáâãäåçèéêëìíîïñòóôõöøùúûüýÿ";
>>>    static dstring c2 = "aaaaaaceeeeiiiinoooooouuuuyy";
>>>    schwartzSort!(a => tr(toLower(a), c1, c2), less)(arr);
>>> }
>>
>> Thanks a million, Jos! This does the trick for me.
>
> Great.
>
> Be aware that the above code does a case insensitive sort, if 
> you need
> case sensitive, you can use something like:
>
>
> import std.stdio, std.string, std.algorithm, std.uni;
>
> void main() {
>     auto sa = ["wow", "ara", "ába", "Marca"];
>     writeln(sa);
>     trSort(sa, CaseSensitive.no);
>     writeln(sa);
> 
>     writeln;
> 
>     sa = ["wow", "ara", "ába", "Marca"];
>     writeln(sa);
>     trSort(sa, CaseSensitive.yes);
>     writeln(sa);
> }
>
> void trSort(C, alias less = "a < b")(C[] arr,
>                             CaseSensitive cs = 
> CaseSensitive.yes) {
> 
>     static c1 = 
> "àáâãäåçèéêëìíîïñòóôõöøùúûüýÿÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝŸ"d;
>     static c2 = 
> "aaaaaaceeeeiiiinoooooouuuuyyAAAAAACEEEEIIIINOOOOOOUUUUYY"d;
> 
>     if (cs == CaseSensitive.no)
>         arr.schwartzSort!(a => a.toLower.tr(c1, c2), less);
>     else
>         arr.schwartzSort!(a => a.tr(c1, c2), less);
> }

Ah, yes of course. I will keep that in mind. At the moment I only 
need case insensitive, but you never know. Thanks again.


More information about the Digitalmars-d-learn mailing list