D Mir: standard deviation speed
tastyminerals
tastyminerals at gmail.com
Tue Jul 14 19:04:45 UTC 2020
I am trying to implement standard deviation calculation in Mir
for benchmark purposes.
I have two implementations. One is the straightforward std =
sqrt(mean(abs(x - x.mean())**2)) and the other follows Welford's
algorithm for computing variance (as described here:
https://www.johndcook.com/blog/standard_deviation/).
However, although the first implementation should be less
efficient / slower, the benchmarking results show a startling
difference in its favour. I'd like to understand if I am doing
something wrong and would appreciate some explanation.
# Naive std
import std.math : abs;
import mir.ndslice;
import mir.math.common : pow, sqrt, fastmath;
import mir.math.sum : sum;
import mir.math.stat : mean;
@fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)
{
pragma(inline, false);
if (flatMatrix.empty)
return 0.0;
double n = cast(double) flatMatrix.length;
double mu = flatMatrix.mean;
return (flatMatrix.map!(a => (a - mu).abs ^^ 2).sum!"fast" /
n).sqrt;
}
# std with Welford's variance
@fastmath double sdWelford(T)(Slice!(T*, 1) flatMatrix)
{
pragma(inline, false);
if (flatMatrix.empty)
return 0.0;
double m0 = 0.0;
double m1 = 0.0;
double s0 = 0.0;
double s1 = 0.0;
double n = 0.0;
foreach (x; flatMatrix.field)
{
++n;
m1 = m0 + (x - m0) / n;
s1 = s0 + (x - m0) * (x - m1);
m0 = m1;
s0 = s1;
}
// switch to n - 1 for sample variance
return (s1 / n).sqrt;
}
Benchmarking:
Naive std (1k loops):
std of [60, 60] matrix 0.001727
std of [300, 300] matrix 0.043452
std of [600, 600] matrix 0.182177
std of [800, 800] matrix 0.345367
std with Welford's variance (1k loops):
std of [60, 60] matrix 0.0225476
std of [300, 300] matrix 0.534528
std of [600, 600] matrix 2.0714
std of [800, 800] matrix 3.60142
More information about the Digitalmars-d-learn
mailing list