Statistics library

dsimcha dsimcha at yahoo.com
Thu Oct 23 15:43:46 PDT 2008


Since there's really no good comprehensive statistics library for D (Tango has
a little bit, the beginnings of a few are on dsource, but nothing much), Ive
been rolling my own statistics functions as necessary.  Almost by accident, it
seems like I've built up the beginnings of a decent statistics library.  I'm
debating whether it might be interesting enough to people to be worth
releasing, and whether enough community help would be available to really make
it production quality, or to merge it with other people's efforts in this
area.  The following functionality is currently available:

Correlation (Pearson, Spearman rho, Kendall tau).   Note that the     Kendall
tau correlation is a very efficient O(N log N) version.

Mean, standard deviation, variance, kurtosis, percent variance for arrays of
numeric values.

Shannon entropy, mutual information.

Kolmogorov-Smirnov tests

Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs, hypergeometric,
Poisson, binomial PDFs.

Inverse normal distribution, and normally distributed random number generation.

A struct to generate all possible permutations of a sequence.


On the other hand, I'm a scientist, not a full-time programmer, and although I
can write working code, I have no clue what it takes to get code up to the
gold standard of "production."  Also, this library is very D2-dependent, and I
have no interest in back-porting it.  Of course if by some chance someone else
wanted to back-port it, they'd be more than welcome.

Most of the code is covered somehow or another by unit tests, although I
cheated a lot by having some unit tests depend on multiple functions.

Is there any interest in this from others in the D community?  Do other people
think that D would benefit from having a decent statistics library?  Other
comments?


More information about the Digitalmars-d-announce mailing list