Shuffle

Sean Kelly sean at f4.ca
Mon Jan 28 14:29:26 PST 2008


Frits van Bommel wrote:
> Sean Kelly wrote:
>> Frits van Bommel wrote:
>>> Both versions would be better if the randomizer eliminated modulo bias,
>>> though :P.
>>
>> I just fixed that as well I think.  Let me know if I screwed it up:
>>
>> http://dsource.org/projects/tango/changeset/3139
> 
> Looks good, assuming libc's rand() has a uniform distribution over the
> lower 16 bits.

Cool, thanks.

> (This is probably just nitpicking:)
> AFAICT that assumption isn't guaranteed by the C standard though, it
> seems RAND_MAX is only required to be >= 32767 (so rand() may well
> return 15-bit random numbers :( [1]).
> Also, I can't find any mention of what the (lower-bitwise) distribution
> is supposed to be. For instance, if RAND_MAX happens to be (<some prime>
> - 1) instead of (2^N - 1) the lower bits won't be uniformly distributed
> even if the results of rand() are themselves distributed perfectly
> uniformly in [0 ... RAND_MAX].
> For these reasons, it may not be a very good idea to use libc rand().
> It's just underspecified AFAICT...

Yup.  I really just use C's rand as a default because it's available and
doing so avoids creating a dependency on tango.math.Random.  I half
considered just requiring the user to supply a randomizer, but I think
people would complain.  Still, I'd expect anyone who was really serious
about having a uniform distribution would supply one--I intend to add an
adapter to the Random module to simplify plugging it into the shuffle
routine to simplify this.

> (I don't have the actual C standard though, just some library reference
> sites from my bookmarks and what Google drags up)

I have it, here's what it says:

"The rand function computes a sequence of pseudo-random integers in the
range 0 to RAND_MAX."

Not much there, huh? :-)  It's also worth mentioning that C's rand
function isn't required to be threadsafe, so it's not a wonderful choice
for a number of reasons.

> [1] For glibc RAND_MAX is 2^32-1 though, so this shouldn't be a problem
> on typical *nix systems. I'm not sure about Windows systems though; IIRC
> mingw (and presumably gdcwin) uses the MS libc and DMD uses its own libc...

The current impl in Tango assumes only 16 bits of randomness, so rand is
called twice to produce a 32-bit value.  I thought about switching based
on RAND_MAX, but that would require it to be up to date for all OSes, so
being conservative just seemed easier.


Sean


More information about the Digitalmars-d-announce mailing list