[OT] shuffling lines in a stream

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Fri Oct 10 14:27:03 PDT 2008


Sergey Gromov wrote:
> Fri, 10 Oct 2008 16:10:29 -0500,
> Andrei Alexandrescu wrote:
>> Sergey Gromov wrote:
>>> Fri, 10 Oct 2008 15:54:18 -0500,
>>> Andrei Alexandrescu wrote:
>>>> Sean Kelly wrote:
>>>>> It would be slower than the seeking option, but something like a 
>>>>> randomized mergesort would work as well.  If the program can buffer k 
>>>>> lines in a file containing n lines, then read the first k lines into 
>>>>> memory, shuffle them, and write them out to a temporary file.  Repeat 
>>>>> until the input is exhausted.  Now randomly pick two of the temporary 
>>>>> files and randomly merge them.  Repeat until two temporary files remain, 
>>>>> then output the result of the final random merge to the screen.
>>>>>
>>>>> For small files (ie. where n<k) the file would be read, shuffled in 
>>>>> memory, and printed to the screen, assuming the proper checks were in 
>>>>> place.
>>>> I think I found a problem with your solution. Consider you break the 
>>>> file in three chunks: a, b, c. Then you pick at random b and c and 
>>>> randomly merge them. The problem is, you make early the decision that 
>>>> nothing in a will appear in the first two thirds of the result. So the 
>>>> quality of randomness suffers. How would you address that?
>>> After merging b and c you end up with a and bc.  Then you randomly merge 
>>> these two files and lines from a have all the chances to appear anywhere 
>>> within result.
>>>
>>> When randomly merging, the probability of picking a line from a file 
>>> should be proportional to a number of lines left in that file.
>> How do you "randomly merge"? Describe the process.
> 
> a[] and b[] are files to merge.  ab[] is the result.  a.length and 
> b.length are number of lines left in either file.
> 
> while (a.length || b.length)
> {
>   if (uniform(gen, 0, a.length + b.length) < a.length)
>   {
>     ab.put(a.head);
>     a.next();
>   }
>   else
>   {
>     ab.put(b.head);
>     b.next();
>   }
> }

This should work. Use of ranges noted :o).

Andrei



More information about the Digitalmars-d mailing list