.sort and .reverse break utf8 encoding
Sean Kelly
sean at f4.ca
Wed Oct 4 10:02:28 PDT 2006
Walter Bright wrote:
> Derek Parnell wrote:
>> On Tue, 03 Oct 2006 21:43:46 +0100, Stewart Gordon wrote:
>>
>>> d-bugmail at puremagic.com wrote:
>>>> writefln("sorted"); validate(a.sort); // fails
>>>> writefln("reversed"); validate(a.reverse); // fails
>>> AIUI sort and reverse are defined to sort/reverse the individual
>>> elements of the array, rather than the Unicode characters that make
>>> up a string. But hmm....
>>
>> Yes, I realize that but it makes Walter's statements that char[] is
>> all we
>> need and we do not need a 'string' a bit weaker.
>
> .sort and .reverse should reverse the unicode characters. If you want to
> reverse/sort the individual bytes, you should cast it to a ubyte[] first.
Changing the behavior of .reverse kind of makes sense, but I don't
understand the reason for changing .sort aside from consistency.
Personally, I've never had a reason to sort a char array in the first
place unless the chars were intended to represent something other than
their lexical meaning. And that aside, sorting chars in a string
without a comparison predicate will do so using the char's binary value,
which has no lexical significance beyond the 26 letters of the English
alphabet (as represented in ASCII). I'm starting to feel like people
are harping on Unicode issues just for the sake of doing so rather than
because these are actual problems. Can someone please explain what I'm
missing?
Sean
More information about the Digitalmars-d-bugs
mailing list