shuffling lines in a stream

Christopher Wright dhasenan at gmail.com
Sat Oct 11 07:12:57 PDT 2008


Andrei Alexandrescu wrote:
> Sean Kelly wrote:
>> It would be slower than the seeking option, but something like a 
>> randomized mergesort would work as well.  If the program can buffer k 
>> lines in a file containing n lines, then read the first k lines into 
>> memory, shuffle them, and write them out to a temporary file.  Repeat 
>> until the input is exhausted.  Now randomly pick two of the temporary 
>> files and randomly merge them.  Repeat until two temporary files 
>> remain, then output the result of the final random merge to the screen.
>>
>> For small files (ie. where n<k) the file would be read, shuffled in 
>> memory, and printed to the screen, assuming the proper checks were in 
>> place.
> 
> Sweet! This is even better than my solution. (And it's much faster than 
> seeking.)
> 
> Andrei

You can reduce the number of passes by using more temporary files per 
iteration. For instance, with O(log n) files per iteration, you'll get a 
time complexity of O(n log* n) rather than O(n log n). However, using 
more files makes the disk cache unhappy.



More information about the Digitalmars-d mailing list