Challenge: fair partition function

Xinok via Digitalmars-d digitalmars-d at puremagic.com
Mon Feb 8 18:27:29 PST 2016


On Monday, 8 February 2016 at 23:25:00 UTC, Andrei Alexandrescu 
wrote:
> Consider defining a function that partitions a range around a 
> given index like this:
>
> size_t pivotPartition(alias less = (a, b) => a < b, Range)
> (Range r, size_t pivot);
>
> Returns x, one of the the indexes that r[pivot] would have if r 
> were sorted. Also, r[0 .. x] contains stuff no greater than 
> r[x] and r[x + 1 .. $] contains stuff no less than r[x].
>
> The challenge is to implement such a function with fairness: if 
> several elements are equal to r[pivot], return the index 
> closest to r.length / 2.
>
> The function should be efficient, minimize calls to less and 
> swap, etc. A variant that is efficient but does not implement 
> fairness is below.
>
> ...

Ultimately, I think you're going to have to perform two 
comparisons on *some* elements to check for equality. I thought 
of a few tricks which may or may not help.



(1) Keep your code as is and add a final pass to count the number 
of elements. However, you only need to scan the larger partition 
since it contains the index (r.length / 2) and you're trying to 
move closer to that point. The elements in the smaller partition 
can simply be ignored.



(2) You may have code which looks something like this:

     if(less(e, pivot)){ /+ less than +/ }
     else if(less(pivot, e)){ /+ greater than +/ }
     else{ /+ equal to +/ }}

This is a big win if most of the elements are less than the 
pivot, but also a big loss if most of the elements are greater 
than the pivot. A simple trick is to repeat the code twice but 
swap the order of comparisons so you get:

     if(less(pivot, e)){ /+ greater than +/ }
     else if(less(e, pivot)){ /+ less than +/ }
     else{ /+ equal to +/ }}

This will place us closer to an average 1.5 comparisons per 
element which is better than (1).



(3) Similar to (2) except you only compare each element once so 
you're not checking for equality. You're simply alternating 
between checking if (e < pivot) or if (pivot < e). So the code 
may look something like this:

     if(less(pivot, e)){ less }
     else{ greater or equal }

     if(less(e, pivot)){ greater }
     else{ less or equal }

 From this, you can group the elements into four partitions:

[LE L G GE]

So you have elements which are definitely less than or greater 
than the pivot but also some elements which *may* be equal to the 
pivot. Combine this with the first trick (1) and you only have to 
scan LE or GE for equality but not both. This is putting us 
closer to an average 1.25 comparisons per element but a 4-way 
partition would also require much more work.


More information about the Digitalmars-d mailing list