H. S. Teoh hsteoh at quickfur.ath.cx
Fri Feb 17 18:16:56 PST 2012

```On Sat, Feb 18, 2012 at 02:52:15AM +0100, Alf P. Steinbach wrote:
> On 18.02.2012 02:39, H. S. Teoh wrote:
> >// Outputs a randomly selected line from standard input with equal
> >// likelihood.
> >import std.random;
> >import std.stdio;
> >
> >void main() {
> >	auto n = 0;
> >	string choice;
> >	foreach (line; stdin.byLine()) {
> >		n++;
> >		if (uniform(0,n) == 0)
> >			choice = line.idup;
> >	}
> >	writeln(choice);
> >}
> >
> >
> >P.S. Proof that the probability any line is selected is exactly 1/n
> >(where n is the total number of lines read) is left as an exercise
>
> Assuming that by "any" you mean "any particular", you would have to
> read all the lines first. Otherwise, if the code selects the first
> line with probability 1/K, then I can just input some other number of
> lines.

But therein lies the trick. The algorithm self-adapts to the number of
lines it reads. It does not know in advance how many lines there are,
but guarantees that by the end of the input, the probability that any
particular line is selected is exactly 1/n, where n is the number of
lines read. The key lies in the way uniform() is called. :)

(This has been tested by running the program repeatedly on the same
input and analysing the number of occurrences of each line. The proof of
the algorithm is left as an exercise for the reader. ;-) )

> >P.S.S. The .idup is a bit ugly, but necessary, since apparently byLine()
> >calls readln() with a static buffer, so choice will be silently
> >overwritten if the .idup is omitted.
>
> That sounds ominous. One should never have to be aware of low level
> details in order to do simple string assignment or initialization,
> when the source already is a string. Does one really have to do that
> in D?
[...]

Well, there are two issues here. One is that stdin.byLine() returns
char[], which cannot be assigned to string directly. Two is that the
first version of the code didn't have the .idup (I used char[] instead
of string for choice), and the result was garbage in the output due to
said overwriting of buffer.

Now I agree that one shouldn't need to know how byLine() is implemented
in order to be able to use it, but this is the way the current Phobos
implementation is, unfortunately.

However, the one upside to all this is that the type system, in a sense,
forces you to do the right thing: byLine() returns char[] which cannot
be assigned to string, so naturally you have to write .idup, which
automatically also avoids the buffer overwriting issue. Using string
instead of char[] is a logical choice in this case (pun intended ;-)),
because it's something whose value you want to retain until the next
time it's modified. So using string for 'choice' naturally leads to
needing an .idup in the loop. The system isn't perfect, but it's pretty
good.

T

--
Political correctness: socially-sanctioned hypocrisy.
```