Sorting char arrays

Tue Mar 13 02:13:23 PDT 2012

On 2012-03-12 17:24:56 +0000, Jonathan M Davis said:

> The problem is that sort requires a random access range, and narrow string
> (string and wstring) aren't, because they're variable length encoded. I'm not
> sure that strings _can_ be efficiently sorted, because of that, but maybe
> there's a sorting algorthm that could do it reasonably efficiently,

I'd certainly think that'd be posible. (Might, in fact, be a nice 
problem for the students in my algorithms class ;)

I guess I'm just tripped up by the fact that char[] is treated as an 
"almost-string", and hence is "infected" by the variable-length 
encoding of string (i.e., const char[]).

This all makes sense, for sure -- at least as a practical tradeoff or 
the like. (After all, UTF-8 is a super-elegant solution to a very 
difficult problem.) It's just so easy to assume that T[] is a 
random-access range. It seems so *obvious*. And it is true for any type 
T except the variable-length chars... :)

In my case, I was just using char literals (and strings) as an easy way 
of encoding test cases for a template class, and the sorting (+ uniq) 
was just a way of removing duplicates. (Could've used a hash, of 
course.) So I was really just treating it as a ubyte[]. Slightly 
Evil[tm], I guess.

> and we could
> special case sort for narrow strings to use that one, but it's a while since I
> messed with sorting algorithms, so I don't remember all of their
> characteristics off of the top of my head. Certainly, with how sort is currenly
> implemented, it can't handle any range which isn't a random access range.

No, I get that. I was just assuming that any T[] could be treated as a 
random access range if I really wanted to ;)

-- 
Magnus Lie Hetland
http://hetland.org