[OT] shuffling lines in a stream

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Fri Oct 10 07:37:05 PDT 2008


KennyTM~ wrote:
> Andrei Alexandrescu wrote:
>> Sergey Gromov wrote:
>>> Fri, 10 Oct 2008 07:55:38 -0500,
>>> Andrei Alexandrescu wrote:
>>>> Sergey Gromov wrote:
>>>>> Thu, 09 Oct 2008 23:42:47 -0500,
>>>>> Andrei Alexandrescu wrote:
>>>>>> The high traffic enjoyed by "random k-sample of a file" confirms 
>>>>>> that plenty of people around here have idle cycles ready to be 
>>>>>> spent on fun tasks.
>>>>>>
>>>>>> In that vein, here's another problem. Say you want to randomly 
>>>>>> shuffle lines in a stream. You don't know the size beforehand, and 
>>>>>> one pass is preferable. The stream can be larger than the working 
>>>>>> memory, but not an unbounded amount of times larger. Also lines in 
>>>>>> the stream can be unbounded, but it is guaranteed that at least a 
>>>>>> nonzero fraction of the file fits in working memory.
>>>>>>
>>>>>> You can use temporary files, but of course minimizing their total 
>>>>>> size is recommended. Speed is of course important. Draft your 
>>>>>> solution and explain why you think it's correct and advantageous.
>>>>> Well, my temporary file contains the whole stream.  Because there 
>>>>> is a probability that the last line of the stream goes first.
>>>> Yes, that probability exists, but that doesn't imply you must load 
>>>> the entire file in memory.
>>>
>>> This implies though that I cannot start output before I read all the 
>>> lines from the stream.  Therefore I must store those lines somewhere.
>>
>> Creating temporary files is fine as long as you never need to store 
>> the entire input in memory.
>>
>> Andrei
> 
> So HDD (or any file system) is not counted as memory? Then just store 
> all temporary variables to HDD.

This takes us back to one lseek per line. You can create temporary 
files. They behave as files usually do - they are better accessed in 
sequential patterns and not too many times.

Andrei



More information about the Digitalmars-d mailing list