Calculating mean and standard deviation with std.algorithm.reduce
Joseph Rushton Wakeling
joseph.wakeling at webdrake.net
Wed Feb 13 05:44:04 PST 2013
Hello all,
A little challenge that's been bothering me today.
The docs for std.algorithm give an illustration of its use to calculate mean and
standard deviation in a single pass:
// Compute sum and sum of squares in one pass
r = reduce!("a + b", "a + b * b")(tuple(0.0, 0.0), a);
assert(r[0] == 35); // sum
assert(r[1] == 233); // sum of squares
// Compute average and standard deviation from the above
auto avg = r[0] / a.length;
auto stdev = sqrt(r[1] / a.length - avg * avg);
However, this formula for standard deviation is one that is well known for being
subject to potentially fatal rounding error. Consider the following:
float[] a = [10_000.0f, 10_001.0f, 10_002.0f];
auto r = reduce!("a + b", "a + b * b")(tuple(0.0f, 0.0f), a);
auto avg = r[0] / a.length;
auto sd = sqrt(r[1] / a.length - avg * avg);
writeln(avg, "\t", sd);
... which gives a value of 0 for standard deviation when compiled with LDC or
GDC, and -nan with DMD, when the correct answer is 0.816497.
An improved online formula calculates together two quantities:
{ x[1] if k == 1
M[k] = {
{ M[k-1] + (x[k] - M[k-1]) / k , if k > 1
{ 0 if k == 1
Q[k] = {
{ Q[k-1] + (k - 1) * ((x[k] - M[k-1]) ^^ 2) / k , if k > 1
... which one can readily prove give you respectively the mean of x[1], ...,
x[k] and the sum of squares of deviations from the mean; so that you can
calculate sample variance as Q[n] / (n - 1), or standard variance as Q[n] / n
with standard deviation being given by the square root of this.
In a reduce sense, you'd want to calculate the mean according to,
reduce!"a + (b - a)/k"(0.0, x);
... where k represents the index count 1, 2, 3, ... However, it's not evident
to me how you could get reduce() to know this counting value.
Where calculating Q[k] is concerned, you need to know both the index value _and_
the value of the corresponding M[k]. Again, it's not evident to me how you'd
indicate to reduce() what is needed.
Can anyone offer advice on how to achieve these calculations using reduce?
Thanks & best wishes,
-- Joe
More information about the Digitalmars-d-learn
mailing list