D Mir: standard deviation speed

Tue Jul 14 19:36:21 UTC 2020

On Tuesday, 14 July 2020 at 19:04:45 UTC, tastyminerals wrote:
> I am trying to implement standard deviation calculation in Mir 
> for benchmark purposes.
> I have two implementations. One is the straightforward std = 
> sqrt(mean(abs(x - x.mean())**2)) and the other follows 
> Welford's algorithm for computing variance (as described here: 
> https://www.johndcook.com/blog/standard_deviation/).
>
> However, although the first implementation should be less 
> efficient / slower, the benchmarking results show a startling 
> difference in its favour. I'd like to understand if I am doing 
> something wrong and would appreciate some explanation.
>
> # Naive std
> import std.math : abs;
> import mir.ndslice;
> import mir.math.common : pow, sqrt, fastmath;
> import mir.math.sum : sum;
> import mir.math.stat : mean;
>
> @fastmath private double sd0(T)(Slice!(T*, 1) flatMatrix)
> {
>     pragma(inline, false);
>     if (flatMatrix.empty)
>         return 0.0;
>     double n = cast(double) flatMatrix.length;
>     double mu = flatMatrix.mean;
>     return (flatMatrix.map!(a => (a - mu).abs ^^ 2).sum!"fast" 
> / n).sqrt;
> }
>
>
> # std with Welford's variance
> @fastmath double sdWelford(T)(Slice!(T*, 1) flatMatrix)
> {
>     pragma(inline, false);
>     if (flatMatrix.empty)
>         return 0.0;
>
>     double m0 = 0.0;
>     double m1 = 0.0;
>     double s0 = 0.0;
>     double s1 = 0.0;
>     double n = 0.0;
>     foreach (x; flatMatrix.field)
>     {
>         ++n;
>         m1 = m0 + (x - m0) / n;
>         s1 = s0 + (x - m0) * (x - m1);
>         m0 = m1;
>         s0 = s1;
>     }
>     // switch to n - 1 for sample variance
>     return (s1 / n).sqrt;
> }
>
> Benchmarking:
>
> Naive std (1k loops):
>   std of [60, 60] matrix 0.001727
>   std of [300, 300] matrix 0.043452
>   std of [600, 600] matrix 0.182177
>   std of [800, 800] matrix 0.345367
>
> std with Welford's variance (1k loops):
>   std of [60, 60] matrix 0.0225476
>   std of [300, 300] matrix 0.534528
>   std of [600, 600] matrix 2.0714
>   std of [800, 800] matrix 3.60142

It would be helpful to provide a link.

You should only need one accumulator for mean and centered sum of 
squares. See the python example under the Welford example
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
This may have broken optimization somehow.

variance and standardDeviation were recently added to 
mir.math.stat. They have the option to switch between Welford's 
algorithm and the others. What you call as the naive algorithm, 
is VarianceAlgo.twoPass and the Welford algorithm can be toggled 
with VarianceAlgo.online, which is the default option. It also 
would be interesting if you re-did the analysis with the built-in 
mir functions.

There are some other small differences between your 
implementation and the one in mir, beyond the issue discussed 
above. You take the absolute value before the square root and 
force the use of sum!"fast". Another difference is 
VarianceAlgo.online in mir is using a precise calculation of the 
mean rather than the fast update that Welford uses. This may have 
a modest impact on performance, but should provide more accurate 
results.