[OT] shuffling lines in a stream

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Fri Oct 10 07:20:44 PDT 2008


Sergey Gromov wrote:
> Fri, 10 Oct 2008 07:55:38 -0500,
> Andrei Alexandrescu wrote:
>> Sergey Gromov wrote:
>>> Thu, 09 Oct 2008 23:42:47 -0500,
>>> Andrei Alexandrescu wrote:
>>>> The high traffic enjoyed by "random k-sample of a file" confirms that 
>>>> plenty of people around here have idle cycles ready to be spent on fun 
>>>> tasks.
>>>>
>>>> In that vein, here's another problem. Say you want to randomly shuffle 
>>>> lines in a stream. You don't know the size beforehand, and one pass is 
>>>> preferable. The stream can be larger than the working memory, but not an 
>>>> unbounded amount of times larger. Also lines in the stream can be 
>>>> unbounded, but it is guaranteed that at least a nonzero fraction of the 
>>>> file fits in working memory.
>>>>
>>>> You can use temporary files, but of course minimizing their total size 
>>>> is recommended. Speed is of course important. Draft your solution and 
>>>> explain why you think it's correct and advantageous.
>>> Well, my temporary file contains the whole stream.  Because there is a 
>>> probability that the last line of the stream goes first.
>> Yes, that probability exists, but that doesn't imply you must load the 
>> entire file in memory.
> 
> This implies though that I cannot start output before I read all the 
> lines from the stream.  Therefore I must store those lines somewhere.

Creating temporary files is fine as long as you never need to store the 
entire input in memory.

Andrei



More information about the Digitalmars-d mailing list