sortUniq

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Thu Jan 22 15:33:14 PST 2015


On Thu, Jan 22, 2015 at 02:51:38PM -0800, H. S. Teoh via Digitalmars-d wrote:
> On Thu, Jan 22, 2015 at 02:29:58PM -0800, Andrei Alexandrescu via Digitalmars-d wrote:
> [...]
> > Thanks. I'm thinking of something based on general comparisons.
> > Something like a modified quicksort with fat pivot:
> > 
> > A. Partition into three subranges: R1<pivot, R2=pivot, R3>pivot
> > 
> > B. Recurse on R1 obtaining R11, a subrange of R1 containing only
> > unique elements.
> > 
> > C. Sort-move-unique R2 into the portion after R11, obtaining the
> > result.
> > 
> > A and B seem fine, I'm unclear about C. It would be an algorithm that
> > sorts and simultaneously moves data in a range that overlaps the
> > input.
> [...]
> 
> Wait, if R1 and R3, after the recursion, are sorted and contain only
> unique elements, doesn't that mean that we can just collapse R2 into a
> single element along with the endpoints of R1 and R3? So if R1[$-1] <
> pivot, then we're done, otherwise coalesce it with the pivot, and
> likewise with R3[0].
[...]

Working proof of concept:

	import std.algorithm;
	import std.array;
	import std.stdio;
	
	// Returns: final index of pivot
	size_t partition(R)(R r)
	{
	    auto pivot = r[0];
	    swap(r[0], r[$-1]); // move pivot to end
	
	    auto leftIdx = 0;
	    foreach (i; 0 .. r.length)
	    {
	        if (r[i] < pivot)
	            swap(r[i], r[leftIdx++]);
	    }
	    swap(r[leftIdx], r[$-1]);
	
	    return leftIdx;
	}
	
	R uniqueSort(R)(R r)
	{
	    if (r.empty) return r;
	
	    // Partition
	    auto pivotIdx = partition(r);
	    auto pivot = r[pivotIdx];
	
	    // Recurse
	    auto left = uniqueSort(r[0 .. pivotIdx]);
	    auto right = uniqueSort(r[pivotIdx + 1 .. $]);
	
	    // Coalesce left ~ pivot ~ right.
	    if (!left.empty && left[$-1] == pivot)
	        left = left[0 .. $-1];
	
	    if (!right.empty && right[0] == pivot)
	        right = right[1 .. $];
	
	    r[left.length] = pivot;
	    auto vacant = copy(right, r[left.length + 1 .. $]);
	
	    return r[0 .. $ - vacant.length];
	}
	
	unittest
	{
	    auto data = [4, 0, 9, 7, 2, 5, 1, 1, 3, 4, 2, 0, 9, 5, 3, 6, 1, 7, 4, 8];
	    auto sorted = data.uniqueSort();
	
	    assert(sorted == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]);
	}


I'm not sure about the performance of this algorithm, though, because
the call to copy() might become prohibitive once you add up all the
recursive calls to uniqueSort().

Also, I only got one test case working; no idea if there are subtle bugs
that are lurking in there somewhere. Please test with more test cases.
:-)


T

-- 
"No, John.  I want formats that are actually useful, rather than over-featured megaliths that address all questions by piling on ridiculous internal links in forms which are hideously over-complex." -- Simon St. Laurent on xml-dev


More information about the Digitalmars-d mailing list