shuffling lines in a stream
Sergey Gromov
snake.scaly at gmail.com
Fri Oct 10 05:53:08 PDT 2008
Thu, 09 Oct 2008 23:42:47 -0500,
Andrei Alexandrescu wrote:
> The high traffic enjoyed by "random k-sample of a file" confirms that
> plenty of people around here have idle cycles ready to be spent on fun
> tasks.
>
> In that vein, here's another problem. Say you want to randomly shuffle
> lines in a stream. You don't know the size beforehand, and one pass is
> preferable. The stream can be larger than the working memory, but not an
> unbounded amount of times larger. Also lines in the stream can be
> unbounded, but it is guaranteed that at least a nonzero fraction of the
> file fits in working memory.
>
> You can use temporary files, but of course minimizing their total size
> is recommended. Speed is of course important. Draft your solution and
> explain why you think it's correct and advantageous.
Well, my temporary file contains the whole stream. Because there is a
probability that the last line of the stream goes first.
module shuffle;
import std.cstream: din, dout;
import std.stream: BufferedFile, FileMode;
import std.file: remove, exists;
import std.random: Random, uniform, unpredictableSeed, randomShuffle;
import std.stdio: writeln;
void main()
{
auto gen = Random(unpredictableSeed);
enum tempName = "temp.$$$";
if (exists(tempName))
remove(tempName);
auto temp = new BufferedFile(tempName,
FileMode.In | FileMode.OutNew);
ulong[] cache;
foreach (char[] s; din)
{
cache ~= temp.position;
temp.writeLine(s);
}
randomShuffle(cache, gen);
foreach (pos; cache)
{
temp.position = pos;
writeln(temp.readLine());
}
temp.close();
}
More information about the Digitalmars-d
mailing list