[OT] n-way union
nospam at nospam.com
Tue May 26 03:15:09 PDT 2009
> Hello Andrei,
>> This is somewhat OT but I think it's an interesting problem. Consider
>> the following data:
>> double a =
>> [ 1, 4, 7, 8 ],
>> [ 1, 7 ],
>> [ 1, 7, 8],
>> [ 4 ],
>> [ 7 ],
>> We want to compute an n-way union, i.e., efficiently span all elements
>> in all arrays in a, in sorted order. You can assume that each
>> individual array in a is sorted. The output of n-way union should be:
>> auto witness = [
>> 1, 1, 1, 4, 4, 7, 7, 7, 7, 8, 8
>> assert(equal(nWayUnion(a), witness));
>> The STL and std.algorithm have set_union that does that for two sets:
>> for example, set_union(a, a) outputs [ 1, 1, 4, 7, 7, 8 ]. But
>> n-way unions poses additional challenges. What would be a fast
>> algorithm? (Imagine a could be a very large range of ranges).
>> Needless to say, nWayUnion is a range :o).
>> Finally, why would anyone care for something like this?
> in sudocode
> 1. build a min heap out of the elements based on the first element of
> each array
> 2. remove the first value from the root of the heap,
> 3. if empty array, drop it
> 4. if not empty re-heap
> 5. goto 2 if anything left
In practice, I bet it'll go faster with a couple more steps:
2. remove the first value (from array X) from the root of the heap,
2a. Determine new root of the heap (don't need to do a full re-heap yet).
2b. move all values from X until empty or >= new root.
There are interesting cases like:
where in theory, the total number of comparisons required is a function
only of the number of arrays, and is independent of the actual number of
If it's common to have runs where consecutive values come from a single
array, and the arrays are random access, you could do MUCH better than O(n).
More information about the Digitalmars-d