.sort and .reverse break utf8 encoding

Sean Kelly sean at f4.ca
Wed Oct 4 10:02:28 PDT 2006


Walter Bright wrote:
> Derek Parnell wrote:
>> On Tue, 03 Oct 2006 21:43:46 +0100, Stewart Gordon wrote:
>>
>>> d-bugmail at puremagic.com wrote:
>>>>     writefln("sorted");   validate(a.sort);  // fails
>>>>     writefln("reversed"); validate(a.reverse); // fails
>>> AIUI sort and reverse are defined to sort/reverse the individual 
>>> elements of the array, rather than the Unicode characters that make 
>>> up a string.  But hmm....
>>
>> Yes, I realize that but it makes Walter's statements that char[] is 
>> all we
>> need and we do not need a 'string' a bit weaker.
> 
> .sort and .reverse should reverse the unicode characters. If you want to 
> reverse/sort the individual bytes, you should cast it to a ubyte[] first.

Changing the behavior of .reverse kind of makes sense, but I don't 
understand the reason for changing .sort aside from consistency. 
Personally, I've never had a reason to sort a char array in the first 
place unless the chars were intended to represent something other than 
their lexical meaning.  And that aside, sorting chars in a string 
without a comparison predicate will do so using the char's binary value, 
which has no lexical significance beyond the 26 letters of the English 
alphabet (as represented in ASCII).  I'm starting to feel like people 
are harping on Unicode issues just for the sake of doing so rather than 
because these are actual problems.  Can someone please explain what I'm 
missing?


Sean



More information about the Digitalmars-d-bugs mailing list