Mir vs. Numpy: Reworked!

Sat Dec 5 08:08:21 UTC 2020

On Friday, 4 December 2020 at 20:26:17 UTC, data pulverizer wrote:
> On Friday, 4 December 2020 at 14:48:32 UTC, jmh530 wrote:
>>
>> It looks like all the `sweep_XXX` functions are only defined 
>> for contiguous slices, as that would be the default if define 
>> a Slice!(T, N).
>>
>> How the functions access the data is a big difference. If you 
>> compare the `sweep_field` version with the `sweep_naive` 
>> version, the `sweep_field` function is able to access through 
>> one index, whereas the `sweep_naive` function has to use two 
>> in the 2d version and 3 in the 3d version.
>>
>> Also, the main difference in the NDSlice version is that it 
>> uses *built-in* MIR functionality, like how `sweep_ndslice` 
>> uses the `each` function from MIR, whereas `sweep_field` uses 
>> a for loop. I think this is partially to show that the 
>> built-in MIR functionality is as fast as if you tried to do it 
>> with a for loop yourself.
>
> I see, looking at some of the code, field case is literally 
> doing the indexing calculation right there. I guess ndslice is 
> doing the same thing just with "Mir magic" an in the background?

sweep_ndslice uses (2*N - 1) arrays to index U, this allows LDC 
to unroll the loop.

More details here
https://forum.dlang.org/post/qejwviqovawnuniuagtd@forum.dlang.org

> I'm still not sure why slice is so slow. Doesn't that 
> completely rely on the opSlice implementations? The choice of 
> indexing method and underlying data structure?

sweep_slice is slower because it iterates data in few loops 
rather than in a single one. For small matrices this makes 
JMP/FLOP ratio higher, for large matrices that can't feet into 
the CPU cache, it is less memory efficient.