sortUniq

Xinok via Digitalmars-d digitalmars-d at puremagic.com
Thu Jan 22 16:59:51 PST 2015


On Thursday, 22 January 2015 at 21:40:57 UTC, Andrei Alexandrescu 
wrote:
> There's this classic patter on Unix: |sort|uniq, i.e. sort some 
> data and only display the unique elements.
>
> What would be a better integrated version - one that does 
> sorting and uniq in one shot? I suspect the combination could 
> be quite a bit better than doing the two in sequence.
>
> A few google searches didn't yield much. Ideas?
>
>
> Thanks,
>
> Andrei

My thought is that uniq on a sorted list is only an O(n) 
operation, so it's not an expensive operation by any means. If 
there's to be any benefit to a sortUniq, it has to be capable of 
reducing work incurred during the sorting process; else you're 
just going to end up with something less efficient.

One solution may be RedBlackTree, which has an option to disallow 
duplicate elements. This container has three useful properties:

(1) The tree grows one element at a time. This is in opposition 
to other algorithms like quicksort or heapsort in which you must 
operate on the entire set of elements at once.

(2) Removing duplicates only requires a single comparison per 
element, thus retaining the worst-case of |sort|uniq.

(3) Duplicate elements are removed immediately. Inserting an 
element into a balanced tree with N elements is an O(lg n) 
operation, so the smaller n is, the less work that is required.

The shortcoming of RedBlackTree is that it's not very fast. 
However, I don't know any other O(n lg n) sorting algorithm which 
has these properties.


More information about the Digitalmars-d mailing list