dstats updated for new Phobos

Thu Apr 23 20:56:28 PDT 2009

My statistics library for D, dstats, has been updated to take advantage of the
concepts introduced in the new Phobos:  http://dsource.org/projects/dstats .
It has also received several smaller miscellaneous updates:

1.  Instead of being specific to arrays, all functions now accept the most
general range type feasible.

2.  StackHash and StackSet now "officially" exist in dstats.alloc and are
documented.  These are a hash table and a hash set that use TempAlloc (a stack
based allocator), leading to some impressive speedups when using a hash table
or set within a function as part of an algorithm.  Note that TempAlloc, etc.
are included with dstats because they are used heavily internally and been
co-evolving with the needs of dstats.  For a while, I tried to keep them as
nominally separate libraries, but given how much TempAlloc evolves according
to the needs of dstats, this might not be the best idea.

3.  Mean, standard deviation, variance, skewness, and kurtosis can now be
calculated via an output range interface (in addition to the obvious input
range interface):

void outputFloats(O)(O someOutputRange) {
    // Output a bunch of floats to an output range.
}

OnlineSummary s;
outputFloats(s);
writeln(s.kurtosis);

4.  The information theory module has been reworked.  Rather than making the
functions directly variadic, joint distributions are handled via the Joint
struct.  This eliminates ambiguities and allows things like conditional
entropies involving joint distributions to be calculated.

5.  All modules that were written exclusively by me have been relicensed from
BSD only to dual license Phobos/BSD, to satisfy both the Tango and Phobos
people.  Since the only place I have borrowed code from was MathExtra/Don
Clugston's Tango modules, with his permission, I will probably be able to put
these modules under the dual license also.

5.  Miscellaneous bug fixes.

Now that D2 is getting relatively close to complete, I'm open to suggestions
about setting up some kind of user-friendly installation system for dstats.
The current system is "figure it out yourself".  Assume I know nothing about
this topic.

Also, since scientific computing seems to be a big potential killer app for D,
here are some things that I need help with for dstats, if you want to contribute:

1.  Make the random number generators faster.  They're currently "better than
nothing grade", i.e. they are correct, but that's about it.  I included them
because I haven't gotten around to reading up on how to do a better job and
there is currently no other way AFAIK in D2 to generate random samples from
these distributions at all.

2.  Many of the p-value calculations for non-parametric tests use asymptotic
approximations.  Implementing reasonably efficient exact calculations requires
very good knowledge of dynamic programming.  Currently, I need exact
calculations for Kendall's Tau, Spearman's Rho, the Kolmogorov-Smirnov test,
and the runs test.

3.  Bug reports would be appreciated.  Also, feedback on the API before it
hardens too much would be nice.

4.  Long term, once 1-d statistical stuff is stabilized, it would probably be
good to add support for higher dimensional statistics.