shuffling lines in a stream

Sergey Gromov snake.scaly at gmail.com
Fri Oct 10 05:53:08 PDT 2008


Thu, 09 Oct 2008 23:42:47 -0500,
Andrei Alexandrescu wrote:
> The high traffic enjoyed by "random k-sample of a file" confirms that 
> plenty of people around here have idle cycles ready to be spent on fun 
> tasks.
> 
> In that vein, here's another problem. Say you want to randomly shuffle 
> lines in a stream. You don't know the size beforehand, and one pass is 
> preferable. The stream can be larger than the working memory, but not an 
> unbounded amount of times larger. Also lines in the stream can be 
> unbounded, but it is guaranteed that at least a nonzero fraction of the 
> file fits in working memory.
> 
> You can use temporary files, but of course minimizing their total size 
> is recommended. Speed is of course important. Draft your solution and 
> explain why you think it's correct and advantageous.

Well, my temporary file contains the whole stream.  Because there is a 
probability that the last line of the stream goes first.

module shuffle;

import std.cstream: din, dout;
import std.stream: BufferedFile, FileMode;
import std.file: remove, exists;
import std.random: Random, uniform, unpredictableSeed, randomShuffle;
import std.stdio: writeln;

void main()
{
    auto gen = Random(unpredictableSeed);

    enum tempName = "temp.$$$";
    if (exists(tempName))
        remove(tempName);
    auto temp = new BufferedFile(tempName,
        FileMode.In | FileMode.OutNew);

    ulong[] cache;

    foreach (char[] s; din)
    {
        cache ~= temp.position;
        temp.writeLine(s);
    }

    randomShuffle(cache, gen);

    foreach (pos; cache)
    {
        temp.position = pos;
        writeln(temp.readLine());
    }

    temp.close();
}



More information about the Digitalmars-d mailing list