[OT] shuffling lines in a stream
Sergey Gromov
snake.scaly at gmail.com
Fri Oct 10 07:15:32 PDT 2008
Fri, 10 Oct 2008 07:55:38 -0500,
Andrei Alexandrescu wrote:
> Sergey Gromov wrote:
> > Thu, 09 Oct 2008 23:42:47 -0500,
> > Andrei Alexandrescu wrote:
> >> The high traffic enjoyed by "random k-sample of a file" confirms that
> >> plenty of people around here have idle cycles ready to be spent on fun
> >> tasks.
> >>
> >> In that vein, here's another problem. Say you want to randomly shuffle
> >> lines in a stream. You don't know the size beforehand, and one pass is
> >> preferable. The stream can be larger than the working memory, but not an
> >> unbounded amount of times larger. Also lines in the stream can be
> >> unbounded, but it is guaranteed that at least a nonzero fraction of the
> >> file fits in working memory.
> >>
> >> You can use temporary files, but of course minimizing their total size
> >> is recommended. Speed is of course important. Draft your solution and
> >> explain why you think it's correct and advantageous.
> >
> > Well, my temporary file contains the whole stream. Because there is a
> > probability that the last line of the stream goes first.
>
> Yes, that probability exists, but that doesn't imply you must load the
> entire file in memory.
This implies though that I cannot start output before I read all the
lines from the stream. Therefore I must store those lines somewhere.
More information about the Digitalmars-d
mailing list