.sort and .reverse break utf8 encoding

Walter Bright newshound at digitalmars.com
Wed Oct 4 19:54:11 PDT 2006


Sean Kelly wrote:
> Changing the behavior of .reverse kind of makes sense, but I don't 
> understand the reason for changing .sort aside from consistency. 
> Personally, I've never had a reason to sort a char array in the first 
> place unless the chars were intended to represent something other than 
> their lexical meaning.  And that aside, sorting chars in a string 
> without a comparison predicate will do so using the char's binary value, 
> which has no lexical significance beyond the 26 letters of the English 
> alphabet (as represented in ASCII).  I'm starting to feel like people 
> are harping on Unicode issues just for the sake of doing so rather than 
> because these are actual problems.  Can someone please explain what I'm 
> missing?

A use for it is collecting character usage frequency statistics is one 
such. Read a text file into a buffer, sort the buffer, and dump the result!

I don't mind the harping on it. Getting the details right is important, 
even if the details themselves aren't. Besides, it's an easy fix.



More information about the Digitalmars-d-bugs mailing list