shuffling lines in a stream

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Thu Oct 9 21:42:47 PDT 2008


The high traffic enjoyed by "random k-sample of a file" confirms that 
plenty of people around here have idle cycles ready to be spent on fun 
tasks.

In that vein, here's another problem. Say you want to randomly shuffle 
lines in a stream. You don't know the size beforehand, and one pass is 
preferable. The stream can be larger than the working memory, but not an 
unbounded amount of times larger. Also lines in the stream can be 
unbounded, but it is guaranteed that at least a nonzero fraction of the 
file fits in working memory.

You can use temporary files, but of course minimizing their total size 
is recommended. Speed is of course important. Draft your solution and 
explain why you think it's correct and advantageous.

In case you hate random stuff, just wait; the next problem won't have 
any random element in it :o).


Andrei



More information about the Digitalmars-d mailing list