std.math performance (SSE vs. real)

Thu Jun 26 23:48:34 PDT 2014

On 27 June 2014 07:14, Iain Buclaw <ibuclaw at gdcproject.org> wrote:
> On 27 June 2014 02:31, David Nadlinger via Digitalmars-d
> <digitalmars-d at puremagic.com> wrote:
>> Hi all,
>>
>> right now, the use of std.math over core.stdc.math can cause a huge
>> performance problem in typical floating point graphics code. An instance of
>> this has recently been discussed here in the "Perlin noise benchmark speed"
>> thread [1], where even LDC, which already beat DMD by a factor of two,
>> generated code more than twice as slow as that by Clang and GCC. Here, the
>> use of floor() causes trouble. [2]
>>
>> Besides the somewhat slow pure D implementations in std.math, the biggest
>> problem is the fact that std.math almost exclusively uses reals in its API.
>> When working with single- or double-precision floating point numbers, this
>> is not only more data to shuffle around than necessary, but on x86_64
>> requires the caller to transfer the arguments from the SSE registers onto
>> the x87 stack and then convert the result back again. Needless to say, this
>> is a serious performance hazard. In fact, this accounts for an 1.9x slowdown
>> in the above benchmark with LDC.
>>
>> Because of this, I propose to add float and double overloads (at the very
>> least the double ones) for all of the commonly used functions in std.math.
>> This is unlikely to break much code, but:
>>  a) Somebody could rely on the fact that the calls effectively widen the
>> calculation to 80 bits on x86 when using type deduction.
>>  b) Additional overloads make e.g. "&floor" ambiguous without context, of
>> course.
>>
>> What do you think?
>>
>> Cheers,
>> David
>>
>
> This is the reason why floor is slow, it has an array copy operation.
>
> ---
>   auto vu = *cast(ushort[real.sizeof/2]*)(&x);
> ---
>
> I didn't like it at the time I wrote, but at least it prevented the
> compiler (gdc) from removing all bit operations that followed.
>
> If there is an alternative to the above, then I'd imagine that would
> speed up floor by tenfold.
>

Can you test with this?

https://github.com/D-Programming-Language/phobos/pull/2274

Float and Double implementations of floor/ceil are trivial and I can add later.