[OT] shuffling lines in a stream

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Fri Oct 10 15:32:36 PDT 2008


Sean Kelly wrote:
> Andrei Alexandrescu wrote:
>> Sean Kelly wrote:
>>> It would be slower than the seeking option, but something like a 
>>> randomized mergesort would work as well.  If the program can buffer k 
>>> lines in a file containing n lines, then read the first k lines into 
>>> memory, shuffle them, and write them out to a temporary file.  Repeat 
>>> until the input is exhausted.  Now randomly pick two of the temporary 
>>> files and randomly merge them.  Repeat until two temporary files 
>>> remain, then output the result of the final random merge to the screen.
>>>
>>> For small files (ie. where n<k) the file would be read, shuffled in 
>>> memory, and printed to the screen, assuming the proper checks were in 
>>> place.
>>
>> I think I found a problem with your solution. Consider you break the 
>> file in three chunks: a, b, c. Then you pick at random b and c and 
>> randomly merge them. The problem is, you make early the decision that 
>> nothing in a will appear in the first two thirds of the result. So the 
>> quality of randomness suffers. How would you address that?
> 
> When a is merged with (b&c) then its lines will be interleaved randomly 
> in the result.

I think that works with Sergey's appropriate crooking of the dice. For 
the record, my solution involved generating k shuffled sub-files, and 
then roll a fair k-sided dice and write one line in the chosen file to 
the output, until all files are exhausted. Sergey showed that my 
solution is wrong unless it so happens the last fragment is equal to all 
others. Thanks!

Well are we ready for the next puzzle, or bored?

Andrei



More information about the Digitalmars-d mailing list