Go rant

Wed Jan 13 13:55:11 PST 2010

Don wrote:

> dsimcha wrote:
>> == Quote from Don (nospam at nospam.com)'s article
>>> "the bubble sort seems to have nothing to recommend it, except a 
catchy
>>> name " - Knuth.
>> 
>> Well, the bubble sort distance is a pretty good metric of how 
similarly
>> ordered
>> two lists are.  It's useful, for example, in statistics such as 
Kendall's
>> Tau. One of the easiest ways to calculate it is...with a bubble 
sort.
> 
> I don't think that's a coincidence. It seems to be defined in terms 
of
> bubble sort!
> 
> The whole bubble sort phenonemon seems to be self-perpetuating. 
There
> were loads of really crap algorithms in the 1950's. Why was this one 
not
> forgotten with the rest?

Because it's SIMPLE.  And because for the small sample sizes used in 
most classes (still?) scaling with n^2 isn't all that bad.  Someone 
said that the insertion sort is better even for small sample sizes, 
but if you don't have an array type that implements insertion, then 
that means writing an extra loop to move things down.  I might have 
suggested that the exchange sort was a good choice.  But with small 
sample sizes and limited memory (no longer a significant problem for 
cases where this is even vaguely reasonable) the bubble sort isn't 
that bad.  In fact it's difficult to measure the difference as n tends 
towards 3.  For n == 2 there's no other even vaguely reasonable 
choice.

The problem with bubble sorts is the factor of n^2.  My teacher might 
well have been wrong when he put n as high as 25 for where the bubble 
sort was optimal, but it's clearly optimal when n is less than 4.  
It's just that those cases don't matter in the real world.  (And 
actually, they'd probably be handled with a case statement, or a 
nested if.)  But if you have code that needs to handle sorting array 
sizes between 3 and 25 (varying in the calls, and tending towards the 
low end), then there's no reason to go for anything more complicated.  
Of course, a better choice yet is often to use a library function.  
Even if the low end calls have excess overhead, it won't matter much, 
and you don't need to worry about coding errors.

(Yeah, I believe that Knuth said there was no reason for the bubble 
sort existing.  And without knowing the exact context in which he said 
it, I'm not going to call him wrong.  The times to use it are not only 
extremely limited, there's also a tendency to use it at the wrong 
times, just because you have the code handy, and it's easy.)

P.S.:  If you have an Array class, and the array class has an insert 
method, then the insertion sort is clearly better for building a 
sorted list.  It's less clear if you're sorting a list that you 
already have, and you're trying to avoid allocating new memory.  
Especially if it's not stored a container class with an insert method.

N.B.:  If the bubble sort weren't taught, it would be continually re-
invented.  And as it's so rarely the proper choice, it needs to be 
warned against.  Preferably by having someone write something that, 
sorts things interactively, and then having them start adding things 
to be sorted.  But processors may be fast enough to defeat this 
experiential learning unless you add things in a way that doubles the 
collection size each time you make an addition.