Random sampling in Phobos

Lars T. Kyllingstad public at kyllingen.net
Tue Apr 17 22:50:09 PDT 2012


On Wednesday, 18 April 2012 at 05:45:11 UTC, Lars T. Kyllingstad 
wrote:
> On Wednesday, 18 April 2012 at 01:17:18 UTC, Joseph Rushton 
> Wakeling wrote:
>> On 17/04/12 18:25, Joseph Rushton Wakeling wrote:
>>> On 17/04/12 17:31, Andrei Alexandrescu wrote:
>>>> Actually that's not correct. RandomSample works fine with an 
>>>> input range and
>>>> does not keep it in memory.
>>>
>>> Ahh, OK. I should have anticipated this as the output also 
>>> works as a range.
>>
>> A query on this point, as it looks like this can have some 
>> unfortunate side-effects.
>>
>> If I write e.g.
>>
>>    auto s = randomSample(iota(0,100), 5);
>>
>>    foreach(uint i; s)
>>        writeln(i);
>>
>>    writeln();
>>
>>    foreach(uint i; s)
>>        writeln(i);
>>
>> ... then it's noticeable that the _first_ element of the two 
>> sample lists is identical, while the others change.  If I put 
>> in place a specified source of randomness, e.g.
>>
>>    auto s = randomSample(iota(0,100), 5, rndGen);
>>
>> ... then the numbers stay the same for both lists.
>>
>> I'm presuming that the first element remains identical because 
>> it's defined in the constructor rather than elsewhere, but 
>> it's not clear why passing a defined RNG produces identical 
>> output on both occasions.
>>
>> The effect can be worse with Vitter's Algorithm D because 
>> often, having defined the first element, the second may be 
>> derived deterministically -- meaning the first 2 elements of 
>> the sample are identical.
>>
>> I'm sure that the above use of the RandomSample struct is not 
>> the advised use, but it's still disconcerting to see this.
>
> I've run into this trap more than once. :)  You have to pass 
> the random number generator by ref, otherwise you are just 
> generating two identical sequences of random numbers.  Just 
> change the randomSample signature to:
>
>     auto randomSampleVitter(R, Random)(R r, size_t n, ref 
> Random gen)
>
> -Lars

Hmmm... I see now that randomSample() is defined without ref in 
Phobos as well.  Would you care to check whether my suggestion 
fixes your problem?  If so, it is a bug in Phobos that should be 
fixed.

-Lars


More information about the Digitalmars-d mailing list