[OT] n-way union

Sean Kelly sean at invisibleduck.org
Sat May 23 09:57:17 PDT 2009


Andrei Alexandrescu wrote:
> This is somewhat OT but I think it's an interesting problem. Consider 
> the following data:
> 
> double[][] a =
> [
>     [ 1, 4, 7, 8 ],
>     [ 1, 7 ],
>     [ 1, 7, 8],
>     [ 4 ],
>     [ 7 ],
> ];
> 
> We want to compute an n-way union, i.e., efficiently span all elements 
> in all arrays in a, in sorted order. You can assume that each individual 
> array in a is sorted. The output of n-way union should be:
> 
> auto witness = [
>     1, 1, 1, 4, 4, 7, 7, 7, 7, 8, 8
> ];
> assert(equal(nWayUnion(a), witness[]));
> 
> The STL and std.algorithm have set_union that does that for two sets: 
> for example, set_union(a[0], a[1]) outputs [ 1, 1, 4, 7, 7, 8 ]. But 
> n-way unions poses additional challenges. What would be a fast 
> algorithm? (Imagine a could be a very large range of ranges).

It seems like there are two basic options: either merge the arrays (as
per merge sort) and deal with multiple passes across the same data
elements or insert the elements from each array into a single
destination array and deal with a bunch of memmove operations.

The merge option is kind of interesting because it could benefit from a
parallel range of sorts.  Each front() op could actually return a range
which contained the front element of each non-empty range.  Pass this to 
a min() op accepting a range and drop the min element into the 
destination.  The tricky part would be working things in such a way that 
the originating range could have popFront() called once the insertion 
had occurred.

> Needless to say, nWayUnion is a range :o).
> 
> Finally, why would anyone care for something like this?

Other than mergeSort?  I'd think that being able to perform union of N 
sets in unison would be a nice way to eliminate arbitrary restrictions.



More information about the Digitalmars-d mailing list