[OT] shuffling lines in a stream
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Fri Oct 10 14:10:29 PDT 2008
Sergey Gromov wrote:
> Fri, 10 Oct 2008 15:54:18 -0500,
> Andrei Alexandrescu wrote:
>> Sean Kelly wrote:
>>> It would be slower than the seeking option, but something like a
>>> randomized mergesort would work as well. If the program can buffer k
>>> lines in a file containing n lines, then read the first k lines into
>>> memory, shuffle them, and write them out to a temporary file. Repeat
>>> until the input is exhausted. Now randomly pick two of the temporary
>>> files and randomly merge them. Repeat until two temporary files remain,
>>> then output the result of the final random merge to the screen.
>>>
>>> For small files (ie. where n<k) the file would be read, shuffled in
>>> memory, and printed to the screen, assuming the proper checks were in
>>> place.
>> I think I found a problem with your solution. Consider you break the
>> file in three chunks: a, b, c. Then you pick at random b and c and
>> randomly merge them. The problem is, you make early the decision that
>> nothing in a will appear in the first two thirds of the result. So the
>> quality of randomness suffers. How would you address that?
>
> After merging b and c you end up with a and bc. Then you randomly merge
> these two files and lines from a have all the chances to appear anywhere
> within result.
>
> When randomly merging, the probability of picking a line from a file
> should be proportional to a number of lines left in that file.
How do you "randomly merge"? Describe the process.
Andrei
More information about the Digitalmars-d
mailing list