[OT] shuffling lines in a stream
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Fri Oct 10 05:55:38 PDT 2008
Sergey Gromov wrote:
> Thu, 09 Oct 2008 23:42:47 -0500,
> Andrei Alexandrescu wrote:
>> The high traffic enjoyed by "random k-sample of a file" confirms that
>> plenty of people around here have idle cycles ready to be spent on fun
>> tasks.
>>
>> In that vein, here's another problem. Say you want to randomly shuffle
>> lines in a stream. You don't know the size beforehand, and one pass is
>> preferable. The stream can be larger than the working memory, but not an
>> unbounded amount of times larger. Also lines in the stream can be
>> unbounded, but it is guaranteed that at least a nonzero fraction of the
>> file fits in working memory.
>>
>> You can use temporary files, but of course minimizing their total size
>> is recommended. Speed is of course important. Draft your solution and
>> explain why you think it's correct and advantageous.
>
> Well, my temporary file contains the whole stream. Because there is a
> probability that the last line of the stream goes first.
Yes, that probability exists, but that doesn't imply you must load the
entire file in memory.
Andrei
More information about the Digitalmars-d
mailing list