shuffling lines in a stream
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Thu Oct 9 23:11:18 PDT 2008
Andrei Alexandrescu wrote:
> The high traffic enjoyed by "random k-sample of a file" confirms that
> plenty of people around here have idle cycles ready to be spent on fun
> tasks.
>
> In that vein, here's another problem. Say you want to randomly shuffle
> lines in a stream. You don't know the size beforehand, and one pass is
> preferable. The stream can be larger than the working memory, but not an
> unbounded amount of times larger. Also lines in the stream can be
> unbounded, but it is guaranteed that at least a nonzero fraction of the
> file fits in working memory.
>
> You can use temporary files, but of course minimizing their total size
> is recommended. Speed is of course important. Draft your solution and
> explain why you think it's correct and advantageous.
>
> In case you hate random stuff, just wait; the next problem won't have
> any random element in it :o).
>
>
> Andrei
Forgot to mention a desirable property. It would be nice if the
algorithm would naturally subscale, e.g. if ran on a short file, it
would produce a shuffled file with the speed competitive with an
algorithm conceived for shorter files.
Andrei
More information about the Digitalmars-d
mailing list