sortUniq

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Thu Jan 22 17:20:14 PST 2015


On Fri, Jan 23, 2015 at 12:59:51AM +0000, Xinok via Digitalmars-d wrote:
> On Thursday, 22 January 2015 at 21:40:57 UTC, Andrei Alexandrescu wrote:
> >There's this classic patter on Unix: |sort|uniq, i.e. sort some data
> >and only display the unique elements.
> >
> >What would be a better integrated version - one that does sorting and
> >uniq in one shot? I suspect the combination could be quite a bit
> >better than doing the two in sequence.
[...]
> My thought is that uniq on a sorted list is only an O(n) operation, so
> it's not an expensive operation by any means. If there's to be any
> benefit to a sortUniq, it has to be capable of reducing work incurred
> during the sorting process; else you're just going to end up with
> something less efficient.
> 
> One solution may be RedBlackTree, which has an option to disallow
> duplicate elements. This container has three useful properties:

The problem with RedBlackTree is that it's cache-unfriendly due to the
memory allocation and per-node pointers. One reason quicksort performs
so well is because it works in-place, and the partition operation
accesses elements bilinearly (i.e., two linear traversals over the same
stretch of memory interleaved with each other), so it's very likely that
most of the accesses will hit the cache (possibly even all, if hardware
prefetching is present).  Whereas with RedBlackTree, it's jumping all
over memory, so there will be a higher rate of cache misses and the
resulting slow RAM fetches. Plus, memory allocation is slooow, and is
best avoided where possible.


T

-- 
What's a "hot crossed bun"? An angry rabbit.


More information about the Digitalmars-d mailing list