Update on random sampling in Phobos
Joseph Rushton Wakeling
joseph.wakeling at webdrake.net
Sat Apr 21 14:12:50 PDT 2012
Hello all,
An update on the proposed replacement for Phobos' RandomSample struct.
A demo copy of the code is on GitHub at:
https://github.com/WebDrake/RandomSample
This is a little standalone D file that can be compiled and run and which goes
through several simple benchmark comparisons of the current Phobos RandomSample
compared to this new version based on J.S. Vitter's superior algorithm.
General conclusions:
* The speed difference is quite apparent whenever there is a significant
difference (2+ orders of magnitude) between the sample and data sizes.
It's apparent, but less marked, for smaller ratios.
* For many practical cases (e.g. if you want one sample from a not very
big dataset) the difference would not be noticed. However, if you are
doing a lot of sampling, or if you are sampling from a very large dataset,
the difference can matter.
* Where data records have to be read from an external source, e.g. a file,
the time needed to do this overwhelms any difference in sampling speed.
The new algorithm is still faster, but you don't really notice the
difference.
If there's a desire for more benchmarks or other demonstrations of the code's
effectiveness, please suggest what they might be and I'll go ahead and implement
them[*].
If on the basis of the above you'd like a patch for Phobos proper, let me know
and I'll go ahead and prepare one. Note that before I do that, I'd like a
decision on Jerro's tweak to the randomSample function:
https://github.com/D-Programming-Language/phobos/pull/542
... which is designed to fix bug 7936:
http://d.puremagic.com/issues/show_bug.cgi?id=7936
Thanks and best wishes,
-- Joe
-------
[*] Just checking -- no more emails on this topic have got lost in the ether,
have they?
More information about the Digitalmars-d
mailing list