shuffling lines in a stream
Christopher Wright
dhasenan at gmail.com
Sat Oct 11 07:12:57 PDT 2008
Andrei Alexandrescu wrote:
> Sean Kelly wrote:
>> It would be slower than the seeking option, but something like a
>> randomized mergesort would work as well. If the program can buffer k
>> lines in a file containing n lines, then read the first k lines into
>> memory, shuffle them, and write them out to a temporary file. Repeat
>> until the input is exhausted. Now randomly pick two of the temporary
>> files and randomly merge them. Repeat until two temporary files
>> remain, then output the result of the final random merge to the screen.
>>
>> For small files (ie. where n<k) the file would be read, shuffled in
>> memory, and printed to the screen, assuming the proper checks were in
>> place.
>
> Sweet! This is even better than my solution. (And it's much faster than
> seeking.)
>
> Andrei
You can reduce the number of passes by using more temporary files per
iteration. For instance, with O(log n) files per iteration, you'll get a
time complexity of O(n log* n) rather than O(n log n). However, using
more files makes the disk cache unhappy.
More information about the Digitalmars-d
mailing list