std.math API rework
Ilya Yaroshenko via Digitalmars-d
digitalmars-d at puremagic.com
Fri Oct 7 00:42:43 PDT 2016
On Friday, 7 October 2016 at 01:53:27 UTC, Andrei Alexandrescu
wrote:
> On 10/6/16 12:53 PM, Ilya Yaroshenko wrote:
>> Effective work with std.experimental.ndslice and and
>> mir.ndslice.array
>> requires half of std.math be an exactly aliases to LLVM
>> intrinsics (for
>> LDC).
>
> Why?
>
>> To enable vectorization for mir.ndslice.algorithm I created
>> internal
>> math module [1] in Mir. But this is weird, because third side
>> packages
>> like DCV [2] requires to use the module too. Also, some
>> optimisation for
>> std.complex and future std.exprimental.color would be very
>> ugly without
>> proposed change.
>
> I'd love to understand this point better. In particular, how do
> you reconcile it with kinke's assertion that some of these
> intrinsics simply format to C routines?
>
> Our high-level view is that doing efficient work should not
> require one to fork the standard library. On the other hand,
> the traditional place for compiler-specific code is in the core
> runtime, not the standard library. (There is a tiny bit of
> stdlib code that depends on dmd to be fair.)
>
> So I'd like to be reasonably confident the right rocks are put
> in the right places. Have you considered (per Iain) migrating
> these symbols to core.math and then forward those in stdlib to
> them?
>
>
> Thanks,
>
> Andrei
For example, SUM_i of sqrt(fabs(a[i])) can be vectorised using
mir.ndslice.algorithm.
vxorps instruction can be used for fabs.
vsqrtps instruction can be used for sqrt.
LDC's @fastmath allows to re-associate summation elements.
Depend on data cache level this allows to speed up iteration 8
times for single precision floating point number for AVX (16
times for AVX512?).
Furthermore, at least for x86, @fastmath flag does not break any
math logic. It allows only to re-associate elementes (i mean
exactly this example for x86).
Current std.math has following problems:
1. Math funcitons are not templates -> Phobos should be linked.
1.a I strongly decided to move forward without DRuntime. A
phobos as source library is partially OK, but no linking
dependencies should be. BetterC mode is what required for Mir to
replace OpenBLAS and Eigen. New cpuid, threads and mutexes should
be provided too. New cpuid [1] is already implemented (I just
need to replace module constructor with explicit initialization
function). My strong opinion is that a D library for D is a wrong
direction. A numeric D library should be a product for other
languages too, like many C libraries does. One my client is
thinking to invest to nothrow @nogc async I/O for production, so
it may help to move to betterC direction too.
2.b In context of 1.a, linking multiple binaries compiled with
different DRuntime/Phobos versions may cause significant
problems. DRuntime is not so stable like std C lib. One may say
that I am doing something wrong if I need to link libraries
compiled with different DRuntimes. But this is what will happen
often with D in real world if D start to replace C libraries
(1.a). So, betterC without DRuntime / Phobos linking dependencies
is a direction to move forward. nothrow @nogc generic Phobos code
seems to be OK.
2. Math funcitons are not templates -> They are not inlined -> No
vectorization + function calls in a loop body. One day this may
be fixed, but (1.a, 1.b).
3. Math funcitons are not aliases for LDC -> LDC's @fastmath
would not work for them. To enable @fastmath for this functions
they should be annotated with @fastmath, which is not acceptable.
If a function is an alias for llvm intrinsics, than @fastmath
flag can be applied to a function, which calls it.
[1] https://github.com/libmir/cpuid
Best regards,
Ilya
More information about the Digitalmars-d
mailing list