sortUniq
Xinok via Digitalmars-d
digitalmars-d at puremagic.com
Thu Jan 22 16:59:51 PST 2015
On Thursday, 22 January 2015 at 21:40:57 UTC, Andrei Alexandrescu
wrote:
> There's this classic patter on Unix: |sort|uniq, i.e. sort some
> data and only display the unique elements.
>
> What would be a better integrated version - one that does
> sorting and uniq in one shot? I suspect the combination could
> be quite a bit better than doing the two in sequence.
>
> A few google searches didn't yield much. Ideas?
>
>
> Thanks,
>
> Andrei
My thought is that uniq on a sorted list is only an O(n)
operation, so it's not an expensive operation by any means. If
there's to be any benefit to a sortUniq, it has to be capable of
reducing work incurred during the sorting process; else you're
just going to end up with something less efficient.
One solution may be RedBlackTree, which has an option to disallow
duplicate elements. This container has three useful properties:
(1) The tree grows one element at a time. This is in opposition
to other algorithms like quicksort or heapsort in which you must
operate on the entire set of elements at once.
(2) Removing duplicates only requires a single comparison per
element, thus retaining the worst-case of |sort|uniq.
(3) Duplicate elements are removed immediately. Inserting an
element into a balanced tree with N elements is an O(lg n)
operation, so the smaller n is, the less work that is required.
The shortcoming of RedBlackTree is that it's not very fast.
However, I don't know any other O(n lg n) sorting algorithm which
has these properties.
More information about the Digitalmars-d
mailing list