[OT] shuffling lines in a stream
Sergey Gromov
snake.scaly at gmail.com
Fri Oct 10 14:19:51 PDT 2008
Fri, 10 Oct 2008 16:10:29 -0500,
Andrei Alexandrescu wrote:
> Sergey Gromov wrote:
> > Fri, 10 Oct 2008 15:54:18 -0500,
> > Andrei Alexandrescu wrote:
> >> Sean Kelly wrote:
> >>> It would be slower than the seeking option, but something like a
> >>> randomized mergesort would work as well. If the program can buffer k
> >>> lines in a file containing n lines, then read the first k lines into
> >>> memory, shuffle them, and write them out to a temporary file. Repeat
> >>> until the input is exhausted. Now randomly pick two of the temporary
> >>> files and randomly merge them. Repeat until two temporary files remain,
> >>> then output the result of the final random merge to the screen.
> >>>
> >>> For small files (ie. where n<k) the file would be read, shuffled in
> >>> memory, and printed to the screen, assuming the proper checks were in
> >>> place.
> >> I think I found a problem with your solution. Consider you break the
> >> file in three chunks: a, b, c. Then you pick at random b and c and
> >> randomly merge them. The problem is, you make early the decision that
> >> nothing in a will appear in the first two thirds of the result. So the
> >> quality of randomness suffers. How would you address that?
> >
> > After merging b and c you end up with a and bc. Then you randomly merge
> > these two files and lines from a have all the chances to appear anywhere
> > within result.
> >
> > When randomly merging, the probability of picking a line from a file
> > should be proportional to a number of lines left in that file.
>
> How do you "randomly merge"? Describe the process.
a[] and b[] are files to merge. ab[] is the result. a.length and
b.length are number of lines left in either file.
while (a.length || b.length)
{
if (uniform(gen, 0, a.length + b.length) < a.length)
{
ab.put(a.head);
a.next();
}
else
{
ab.put(b.head);
b.next();
}
}
More information about the Digitalmars-d
mailing list