shuffling lines in a stream
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Thu Oct 9 21:42:47 PDT 2008
The high traffic enjoyed by "random k-sample of a file" confirms that
plenty of people around here have idle cycles ready to be spent on fun
tasks.
In that vein, here's another problem. Say you want to randomly shuffle
lines in a stream. You don't know the size beforehand, and one pass is
preferable. The stream can be larger than the working memory, but not an
unbounded amount of times larger. Also lines in the stream can be
unbounded, but it is guaranteed that at least a nonzero fraction of the
file fits in working memory.
You can use temporary files, but of course minimizing their total size
is recommended. Speed is of course important. Draft your solution and
explain why you think it's correct and advantageous.
In case you hate random stuff, just wait; the next problem won't have
any random element in it :o).
Andrei
More information about the Digitalmars-d
mailing list