.sort and .reverse break utf8 encoding
Lionello Lunesu
lio at lunesu.remove.com
Wed Oct 4 22:45:58 PDT 2006
Sean Kelly wrote:
> Walter Bright wrote:
>> Derek Parnell wrote:
>>> On Tue, 03 Oct 2006 21:43:46 +0100, Stewart Gordon wrote:
>>>
>>>> d-bugmail at puremagic.com wrote:
>>>>> writefln("sorted"); validate(a.sort); // fails
>>>>> writefln("reversed"); validate(a.reverse); // fails
>>>> AIUI sort and reverse are defined to sort/reverse the individual
>>>> elements of the array, rather than the Unicode characters that make
>>>> up a string. But hmm....
>>>
>>> Yes, I realize that but it makes Walter's statements that char[] is
>>> all we
>>> need and we do not need a 'string' a bit weaker.
>>
>> .sort and .reverse should reverse the unicode characters. If you want
>> to reverse/sort the individual bytes, you should cast it to a ubyte[]
>> first.
>
> Changing the behavior of .reverse kind of makes sense, but I don't
> understand the reason for changing .sort aside from consistency.
> Personally, I've never had a reason to sort a char array in the first
> place unless the chars were intended to represent something other than
> their lexical meaning. And that aside, sorting chars in a string
> without a comparison predicate will do so using the char's binary value,
> which has no lexical significance beyond the 26 letters of the English
> alphabet (as represented in ASCII).
What if you want to use a quick binary search look-up to see if a text
contains a given character? ;)
Not that I've ever needed it, but it makes sense to just fix it.
How often do you .reverse a string, for that matter?
L.
More information about the Digitalmars-d-bugs
mailing list