.sort and .reverse break utf8 encoding

Wed Oct 4 22:45:58 PDT 2006

Sean Kelly wrote:
> Walter Bright wrote:
>> Derek Parnell wrote:
>>> On Tue, 03 Oct 2006 21:43:46 +0100, Stewart Gordon wrote:
>>>
>>>> d-bugmail at puremagic.com wrote:
>>>>>     writefln("sorted");   validate(a.sort);  // fails
>>>>>     writefln("reversed"); validate(a.reverse); // fails
>>>> AIUI sort and reverse are defined to sort/reverse the individual 
>>>> elements of the array, rather than the Unicode characters that make 
>>>> up a string.  But hmm....
>>>
>>> Yes, I realize that but it makes Walter's statements that char[] is 
>>> all we
>>> need and we do not need a 'string' a bit weaker.
>>
>> .sort and .reverse should reverse the unicode characters. If you want 
>> to reverse/sort the individual bytes, you should cast it to a ubyte[] 
>> first.
> 
> Changing the behavior of .reverse kind of makes sense, but I don't 
> understand the reason for changing .sort aside from consistency. 
> Personally, I've never had a reason to sort a char array in the first 
> place unless the chars were intended to represent something other than 
> their lexical meaning.  And that aside, sorting chars in a string 
> without a comparison predicate will do so using the char's binary value, 
> which has no lexical significance beyond the 26 letters of the English 
> alphabet (as represented in ASCII). 

What if you want to use a quick binary search look-up to see if a text 
contains a given character? ;)
Not that I've ever needed it, but it makes sense to just fix it.

How often do you .reverse a string, for that matter?

L.