[Issue 10322] std.random.RandomSample.index() returns wrong value if called before front()

d-bugmail at puremagic.com d-bugmail at puremagic.com
Mon Jun 10 12:58:18 PDT 2013


http://d.puremagic.com/issues/show_bug.cgi?id=10322



--- Comment #4 from Joseph Rushton Wakeling <joseph.wakeling at webdrake.net> 2013-06-10 12:58:16 PDT ---
Note: there's an interesting discrepancy in results if the samplepop.d test
code is tweaked to sample 5 from 10 instead of 5 from 100.  The key difference
here is that because the ratio of sample points to available points is smaller,
Algorithm A will be used instead of Algorithm D.

If we do this, we observe that the sample with popFront() called first does not
reflect an even selection of items from 1 to 99; rather, it looks like the case
where .front is called prior to .popFront().

The explanation for this is as follows.  First, let's cover the 5-from-10 case.
 If .popFront() is called before .front, then the value of _Vprime (used by
Algorithm D) has not been initialized, and is equal to nan.

When _Vprime is used in RandomSample.skip() (std.random(1835)), the cast means
that the value of s is set to 0, and _Vprime is reset (line 1853) to a value of
-nan.

This in turn registers false to the if(_Vprime > 1.0) test, which takes us down
to line 1899 to return s, which is 0.

So, it follows that the call to prime() in the initial popFront() does not
change anything -- the only change to internal values occurs in the body of
popFront() itself with its calls to _input.popFront() and its increments of
_toSelect, _available and _index.  So, at the end of popFront() we are sitting
on the second element of the original _input range, _toSelect and _available
are 4 and 99 respectively, and the internal variable _first is still true.

Now when we read front() it will check _first and, finding it to be true (line
1682) will call prime(), which in turn calls skip() and hence selects as if it
were selecting the first point of the sample (with _toSelect == 4 and
_available == 99).  Hence the even distribution of selection from points 1-99.

If instead we're selecting 5 from 10, the initial call to popFront() and thence
prime() will instead call skipA() (via skip(), lines 1820-1824), which (not
being dependent on any uninitialized variables) will generate a true, typically
non-zero skip value s.

Then, when we call front(), with _first still true, the second call to prime()
will generate _another_ real skip.

It follows that in this second case the algorithm is genuinely taking two
sample points, and hence the non-uniform distribution of selection across
points 1-99.

Sad to say this would all have been caught much easier if skip() had used
std.conv.to instead of cast() .... :-(

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------


More information about the Digitalmars-d-bugs mailing list