[OT] shuffling lines in a stream
KennyTM~
kennytm at gmail.com
Fri Oct 10 07:33:16 PDT 2008
Andrei Alexandrescu wrote:
> Sergey Gromov wrote:
>> Fri, 10 Oct 2008 07:55:38 -0500,
>> Andrei Alexandrescu wrote:
>>> Sergey Gromov wrote:
>>>> Thu, 09 Oct 2008 23:42:47 -0500,
>>>> Andrei Alexandrescu wrote:
>>>>> The high traffic enjoyed by "random k-sample of a file" confirms
>>>>> that plenty of people around here have idle cycles ready to be
>>>>> spent on fun tasks.
>>>>>
>>>>> In that vein, here's another problem. Say you want to randomly
>>>>> shuffle lines in a stream. You don't know the size beforehand, and
>>>>> one pass is preferable. The stream can be larger than the working
>>>>> memory, but not an unbounded amount of times larger. Also lines in
>>>>> the stream can be unbounded, but it is guaranteed that at least a
>>>>> nonzero fraction of the file fits in working memory.
>>>>>
>>>>> You can use temporary files, but of course minimizing their total
>>>>> size is recommended. Speed is of course important. Draft your
>>>>> solution and explain why you think it's correct and advantageous.
>>>> Well, my temporary file contains the whole stream. Because there is
>>>> a probability that the last line of the stream goes first.
>>> Yes, that probability exists, but that doesn't imply you must load
>>> the entire file in memory.
>>
>> This implies though that I cannot start output before I read all the
>> lines from the stream. Therefore I must store those lines somewhere.
>
> Creating temporary files is fine as long as you never need to store the
> entire input in memory.
>
> Andrei
So HDD (or any file system) is not counted as memory? Then just store
all temporary variables to HDD.
More information about the Digitalmars-d
mailing list