std.math API rework

Fri Oct 7 00:42:43 PDT 2016

On Friday, 7 October 2016 at 01:53:27 UTC, Andrei Alexandrescu 
wrote:
> On 10/6/16 12:53 PM, Ilya Yaroshenko wrote:
>> Effective work with std.experimental.ndslice and and 
>> mir.ndslice.array
>> requires half of std.math be an exactly aliases to LLVM 
>> intrinsics (for
>> LDC).
>
> Why?
>
>> To enable vectorization for mir.ndslice.algorithm I created 
>> internal
>> math module [1] in Mir. But this is weird, because third side 
>> packages
>> like DCV [2] requires to use the module too. Also, some 
>> optimisation for
>> std.complex and future std.exprimental.color would be very 
>> ugly without
>> proposed change.
>
> I'd love to understand this point better. In particular, how do 
> you reconcile it with kinke's assertion that some of these 
> intrinsics simply format to C routines?
>
> Our high-level view is that doing efficient work should not 
> require one to fork the standard library. On the other hand, 
> the traditional place for compiler-specific code is in the core 
> runtime, not the standard library. (There is a tiny bit of 
> stdlib code that depends on dmd to be fair.)
>
> So I'd like to be reasonably confident the right rocks are put 
> in the right places. Have you considered (per Iain) migrating 
> these symbols to core.math and then forward those in stdlib to 
> them?
>
>
> Thanks,
>
> Andrei

For example, SUM_i of sqrt(fabs(a[i])) can be vectorised using 
mir.ndslice.algorithm.
vxorps instruction can be used for fabs.
vsqrtps instruction can be used for sqrt.
LDC's @fastmath allows to re-associate summation elements.

Depend on data cache level this allows to speed up iteration 8 
times for single precision floating point number for AVX (16 
times for AVX512?).

Furthermore, at least for x86, @fastmath flag does not break any 
math logic. It allows only to re-associate elementes (i mean 
exactly this example for x86).

Current std.math has following problems:

1. Math funcitons are not templates -> Phobos should be linked.
    1.a I strongly decided to move forward without DRuntime. A 
phobos as source library is partially OK, but no linking 
dependencies should be. BetterC mode is what required for Mir to 
replace OpenBLAS and Eigen. New cpuid, threads and mutexes should 
be provided too. New cpuid [1] is already implemented (I just 
need to replace module constructor with explicit initialization 
function). My strong opinion is that a D library for D is a wrong 
direction. A numeric D library should be a product for other 
languages too, like many C libraries does. One my client is 
thinking to invest to nothrow @nogc async I/O for production, so 
it may help to move to betterC direction too.
   2.b In context of 1.a, linking multiple binaries compiled with 
different DRuntime/Phobos versions may cause significant 
problems. DRuntime is not so stable like std C lib. One may say 
that I am doing something wrong if I need to link libraries 
compiled with different DRuntimes. But this is what will happen 
often with D in real world if D start to replace C libraries 
(1.a). So, betterC without DRuntime / Phobos linking dependencies 
is a direction to move forward. nothrow @nogc generic Phobos code 
seems to be OK.

2. Math funcitons are not templates -> They are not inlined -> No 
vectorization + function calls in a loop body. One day this may 
be fixed, but (1.a, 1.b).

3. Math funcitons are not aliases for LDC -> LDC's @fastmath 
would not work for them. To enable @fastmath for this functions 
they should be annotated with @fastmath, which is not acceptable. 
If a function is an alias for llvm intrinsics, than @fastmath 
flag can be applied to a function, which calls it.

[1] https://github.com/libmir/cpuid

Best regards,
Ilya