[OT] n-way union
Don
nospam at nospam.com
Tue May 26 06:09:59 PDT 2009
Andrei Alexandrescu wrote:
> Don wrote:
>> BCS wrote:
>>> in sudocode
>>>
>>> 1. build a min heap out of the elements based on the first element of
>>> each array
>>> 2. remove the first value from the root of the heap,
>>> 3. if empty array, drop it
>>> 4. if not empty re-heap
>>> 5. goto 2 if anything left
>>>
>>>
>> In practice, I bet it'll go faster with a couple more steps:
>> 2. remove the first value (from array X) from the root of the heap,
>> 2a. Determine new root of the heap (don't need to do a full re-heap yet).
>> 2b. move all values from X until empty or >= new root.
>> 4. reheap
>
> Correct. Unfortunately my heap primitives didn't quite allow that so
> currently I do it the "hard" way: pop from top, remove top from heap,
> reinsert (if not empty). But this is definitely an optimization worth
> doing. I think I'll implement it.
>
>> There are interesting cases like:
>> [5,6,7,8,9,9,......9]
>> [30, 31,32,33,34...38]
>> [1,1,1,1,1,1,1,1,1,1,...1]
>> [10,11,12,13,14,15,16,16,......., 16]
>>
>> where in theory, the total number of comparisons required is a
>> function only of the number of arrays, and is independent of the
>> actual number of elements.
>> If it's common to have runs where consecutive values come from a
>> single array, and the arrays are random access, you could do MUCH
>> better than O(n).
>
> I think you can't do better. Complexity is O(n) at a minimum because you
> need to go through each element. The heap-based algorithm has O(n log
> m). What you could improve on is the log m factor (m = the number of sets).
Depending on what "going through each element" means. Certainly, if you
need to move them, you can't beat O(n) moves, but you can definitely get
O(m*m) < < O(n) comparisons in the best case I described. And that would
also be the number of splicing operations.
You could approximate this without hurting the worst-case too much, by
(for example) once you get a run of more than K entries from the same
array, step forward 2 entries; if that's still less than the root, copy
both to the output, then step forward 4 entries, etc.
That should give a best-case O(m log n) comparisons/splices.
Which might not be of any practical use, of course.
More information about the Digitalmars-d
mailing list