Table of strings sorting problem

Hasan Aljudy hasan.aljudy at gmail.com
Fri Mar 10 18:46:32 PST 2006


S. Chancellor wrote:
> On 2006-03-10 17:20:35 -0800, Aarti <aarti at interia.pl> said:
> 
>> Hello all D-Fans!
>>
>> I encountered a problem with string sorting according to Polish 
>> language rules. Here is a simple test program:
>>
>> // ----------------------------------
>> import std.stdio;
>> void main() {
>>     char[][] table;
>>     table.length=15;
>>     
>>     table[0]="ą";
>>     table[1]="a";
>>     table[2]="ć";
>>     table[3]="c";
>>     table[4]="ę";
>>     table[5]="e";
>>     table[6]="ń";
>>     table[7]="n";
>>     table[6]="ł";
>>     table[7]="l";
>>     table[8]="ó";
>>     table[9]="o";
>>     table[10]="ś";
>>     table[11]="s";
>>     table[12]="ź";
>>     table[13]="ż";
>>     table[14]="z";
>>
>>     table.sort;
>>
>>     foreach(char[] s; table) {
>>         writef(s);
>>     }
>>     writefln();
>> }
>> // ----------------------------------
>>
>> Output of this test is:
>> aceloszóąćęłśźż
>>
>> when it should be:
>> aącćeęlłoósśzźż
>>
>> It looks like sort doesn't sort properly according to language rules.
>>
>> Is it a known issue? How to sort strings in D according to language 
>> rules?
>>
>> PS. Possibility of using Polish characters in class identifiers is for 
>> me really cool. In C++ books in examples you can see all the time 
>> Trojkat instead of Trójkąt (triangle) and it looks awful.
>>
>> Regards
>> Marcin Kuszczak
> 
> 
> Sort works off of the binary value of a character.  To implement a sort 
> algorithm for polish language on characters would need to be manually 
> done by you.  You would need to specify a map from the character to it's 
> sort order and sort based on that.   I'm not sure if the sort property 
> takes a delegate, that was something that was proposed before.   You 
> could mainly say it's coincidence that the latin characters fall in 
> order numerically.  (It was probably done on purpose with the person who 
> decided the ASCII character values though.)
> 
> -S.
> 

And note that the output
 >> aceloszóąćęłśźż
prints "english" characters first!! acelosz



More information about the Digitalmars-d mailing list