Is this a bug? RandomSample claims it needs input range but actually needs forward range

Joseph Rushton Wakeling joseph.wakeling at webdrake.net
Tue Jun 18 21:27:44 PDT 2013


On 06/19/2013 03:32 AM, Jonathan M Davis wrote:
> Any function which uses save requires at least a forward range, and if its 
> template constraint requires only an input range, then its template constraint 
> is wrong. _Any_ function whose template constraint does not actually guarantee 
> that anything that passes the constraint will compile with that function is 
> buggy.

It was a failure of testing.  Doing a bit of git archaeology, it looks like the
save function was introduced first, then the isInputRange template constraints,
but as the unittests all involved arrays as sample input, the clash with .save
was never spotted.  Since most of the practical use cases for randomSample
involve sampling either from an array or from a forward range like iota, it
obviously didn't get spotted "in the wild" either.

The annoying thing is that when I first started using randomSample over a year
ago, I _must_ have encountered this bug, because I wrote code trying to sample
from file.byLine -- the trouble is, at the time, I didn't understand the range
concept very well and so didn't realize the compilation error was the sign of a
bug, I just rewrote my code to read the file into an array of strings and sample
that.  (Kids, let this be a lesson -- always ask when your code won't compile
and you don't understand the error message.)

It was just coincidence that last month I was discussing randomSample on D.learn
and wanted to present an example of its flexibility, so revisited the idea of
sampling from file.byLine and (this time) realized it shouldn't fail to compile.

Incidentally, sampling from .byLine carries a certain cost.  The algorithm used
in RandomSample is designed to sample n items from N in O(n) time and generating
O(n) random variates (i.e. scaling with sample size).  However, that relies on
the fact that one can .popFrontN(s) or .popFrontExactly(s) the input range in
O(1) time.  With .byLine it's O(s) and so the overall complexity of the
algorithm becomes O(N), that is, scaling with input size.


More information about the Digitalmars-d mailing list