Why std.algorithm.sort can't be applied to char[]?

Steven Schveighoffer via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu May 15 06:26:45 PDT 2014


On Wed, 14 May 2014 05:13:42 -0400, Jonathan M Davis via  
Digitalmars-d-learn <digitalmars-d-learn at puremagic.com> wrote:

> On Wed, 14 May 2014 08:27:45 +0000
> monarch_dodra via Digitalmars-d-learn
> <digitalmars-d-learn at puremagic.com> wrote:
>
>> On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via
>> Digitalmars-d-learn wrote:
>> > Sure, you can cast char[] to ubyte[] and sort that if you know
>> > that the array
>> > only holds pure ASCII. In fact, you can use
>> > std.string.representation to do it
>> > - e.g.
>> >
>> > auto ascii = str.representation;
>> >
>> > and if str were mutable, then you could sort it. But that will
>> > only work if
>> > the string only contains ASCII characters. Regardless, he
>> > wanted to know why
>> > he couldn't sort char[], and I explained why - all strings are
>> > treated as
>> > ranges of dchar, making it so that if their element type is
>> > char or wchar, so
>> > they're not random access and thus can't be sorted.
>>
>> Arguably, a smart enough implementation should know how to sort a
>> "char[]", while still preserving codepoint integrity.
>
> I don't think that that can be done at the same algorithmic complexity  
> though.
> So, I don't know if that would be acceptable or not from the standpoint  
> of
> std.algorithm.sort. But even if it's a good idea, someone would have to
> special case sort for char[], and no one has done that.

I think it can be done at O(nlgn) complexity, but you must allocate a  
block of scratch space to do the sorting. You can't do it in-place because  
swapping isn't available.

Might as well convert to dchar and back explicitly.

Merge sort may allow more promising speedups, but I still think you will  
need to allocate extra space.

>
>> As a matter of fact, the built in "sort" property does it.
>>
>> void main()
>> {
>>      char[] s = "éöeèûà".dup;
>>      s.sort;
>>      writeln(s);
>> }
>> //prints:
>> eàèéöû
>
> I'm surprised. I thought that one of Bearophile's favorite complaints  
> was that
> it didn't sort unicode properly (and hence one of the reasons that it  
> should
> be removed). Regardless, I do think that it should be removed.

I can't believe this worked. I want to say that it's a freak accident for  
that set of characters. Looking in druntime, I don't see where the special  
case is.

-Steve


More information about the Digitalmars-d-learn mailing list