# [OT] n-way union

Don nospam at nospam.com
Tue May 26 03:15:09 PDT 2009

```BCS wrote:
> Hello Andrei,
>
>> This is somewhat OT but I think it's an interesting problem. Consider
>> the following data:
>>
>> double[][] a =
>> [
>> [ 1, 4, 7, 8 ],
>> [ 1, 7 ],
>> [ 1, 7, 8],
>> [ 4 ],
>> [ 7 ],
>> ];
>> We want to compute an n-way union, i.e., efficiently span all elements
>> in all arrays in a, in sorted order. You can assume that each
>> individual array in a is sorted. The output of n-way union should be:
>>
>> auto witness = [
>> 1, 1, 1, 4, 4, 7, 7, 7, 7, 8, 8
>> ];
>> assert(equal(nWayUnion(a), witness[]));
>>
>> The STL and std.algorithm have set_union that does that for two sets:
>> for example, set_union(a[0], a[1]) outputs [ 1, 1, 4, 7, 7, 8 ]. But
>> n-way unions poses additional challenges. What would be a fast
>> algorithm? (Imagine a could be a very large range of ranges).
>>
>> Needless to say, nWayUnion is a range :o).
>>
>> Finally, why would anyone care for something like this?
>>
>> Andrei
>>
>
> in sudocode
>
> 1. build a min heap out of the elements based on the first element of
> each array
> 2. remove the first value from the root of the heap,
> 3. if empty array, drop it
> 4. if not empty re-heap
> 5. goto 2 if anything left
>
>
In practice, I bet it'll go faster with a couple more steps:
2. remove the first value (from array X) from the root of the heap,
2a. Determine new root of the heap (don't need to do a full re-heap yet).
2b. move all values from X until empty or >= new root.
4. reheap

There are interesting cases like:
[5,6,7,8,9,9,......9]
[30, 31,32,33,34...38]
[1,1,1,1,1,1,1,1,1,1,...1]
[10,11,12,13,14,15,16,16,......., 16]

where in theory, the total number of comparisons required is a function
only of the number of arrays, and is independent of the actual number of
elements.
If it's common to have runs where consecutive values come from a single
array, and the arrays are random access, you could do MUCH better than O(n).

```