[OT] shuffling lines in a stream

KennyTM~ kennytm at gmail.com
Fri Oct 10 07:33:16 PDT 2008


Andrei Alexandrescu wrote:
> Sergey Gromov wrote:
>> Fri, 10 Oct 2008 07:55:38 -0500,
>> Andrei Alexandrescu wrote:
>>> Sergey Gromov wrote:
>>>> Thu, 09 Oct 2008 23:42:47 -0500,
>>>> Andrei Alexandrescu wrote:
>>>>> The high traffic enjoyed by "random k-sample of a file" confirms 
>>>>> that plenty of people around here have idle cycles ready to be 
>>>>> spent on fun tasks.
>>>>>
>>>>> In that vein, here's another problem. Say you want to randomly 
>>>>> shuffle lines in a stream. You don't know the size beforehand, and 
>>>>> one pass is preferable. The stream can be larger than the working 
>>>>> memory, but not an unbounded amount of times larger. Also lines in 
>>>>> the stream can be unbounded, but it is guaranteed that at least a 
>>>>> nonzero fraction of the file fits in working memory.
>>>>>
>>>>> You can use temporary files, but of course minimizing their total 
>>>>> size is recommended. Speed is of course important. Draft your 
>>>>> solution and explain why you think it's correct and advantageous.
>>>> Well, my temporary file contains the whole stream.  Because there is 
>>>> a probability that the last line of the stream goes first.
>>> Yes, that probability exists, but that doesn't imply you must load 
>>> the entire file in memory.
>>
>> This implies though that I cannot start output before I read all the 
>> lines from the stream.  Therefore I must store those lines somewhere.
> 
> Creating temporary files is fine as long as you never need to store the 
> entire input in memory.
> 
> Andrei

So HDD (or any file system) is not counted as memory? Then just store 
all temporary variables to HDD.



More information about the Digitalmars-d mailing list