D Mir: standard deviation speed

Wed Jul 15 11:23:00 UTC 2020

On Wednesday, 15 July 2020 at 05:57:56 UTC, tastyminerals wrote:
> [snip]
>
> Here is a (WIP) project as of now.
> Line 160 in 
> https://github.com/tastyminerals/mir_benchmarks_2/blob/master/source/basic_ops.d
>
> std of [60, 60] matrix 0.0389492 (> 0.001727)
> std of [300, 300] matrix 1.03592 (> 0.043452)
> std of [600, 600] matrix 4.2875 (> 0.182177)
> std of [800, 800] matrix 7.9415 (> 0.345367)

I changed the dflags-ldc to "-mcpu-native -O" and compiled with 
`dub run --compiler=ldc2`. I got similar results as yours for 
both in the initial run.

I changed sd to

@fmamath private double sd(T)(Slice!(T*, 1) flatMatrix)
{
     pragma(inline, false);
     if (flatMatrix.empty)
         return 0.0;
     double n = cast(double) flatMatrix.length;
     double mu = flatMatrix.mean;
     return (flatMatrix.map!(a => (a - mu) ^^ 2)
             .sum!"precise" / n).sqrt;
}

and got

std of [10, 10] matrix 0.0016321
std of [20, 20] matrix 0.0069788
std of [300, 300] matrix 2.42063
std of [60, 60] matrix 0.0828711
std of [600, 600] matrix 9.72251
std of [800, 800] matrix 18.1356

And the biggest change by far was the sum!"precise" instead of 
sum!"fast".

When I ran your benchStd function with
ans = matrix.flattened.standardDeviation!(double, "online", 
"fast");
I got
std of [10, 10] matrix 1e-07
std of [20, 20] matrix 0
std of [300, 300] matrix 0
std of [60, 60] matrix 1e-07
std of [600, 600] matrix 0
std of [800, 800] matrix 0

I got the same result with Summator.naive. That almost seems too 
low.

The default is Summator.appropriate, which is resolved to 
Summator.pairwise in this case. It is faster than 
Summator.precise, but still slower than Summator.naive or 
Summator.fast. Your welfordSD should line up with Summator.naive.

When I change that to
ans = matrix.flattened.standardDeviation!(double, "online", 
"precise");
I get
Running .\mir_benchmarks_2.exe
std of [10, 10] matrix 0.0031737
std of [20, 20] matrix 0.0153603
std of [300, 300] matrix 4.15738
std of [60, 60] matrix 0.171211
std of [600, 600] matrix 17.7443
std of [800, 800] matrix 34.2592

I also tried changing your welfordSD function based on the stuff 
I mentioned above, but it did not make a large difference.