std.math performance (SSE vs. real)

Fri Jun 27 05:20:54 PDT 2014

On 27 June 2014 11:47, David Nadlinger via Digitalmars-d
<digitalmars-d at puremagic.com> wrote:
> On Friday, 27 June 2014 at 09:37:54 UTC, hane wrote:
>>
>> On Friday, 27 June 2014 at 06:48:44 UTC, Iain Buclaw via Digitalmars-d
>> wrote:
>>>
>>> Can you test with this?
>>>
>>> https://github.com/D-Programming-Language/phobos/pull/2274
>>>
>>> Float and Double implementations of floor/ceil are trivial and I can add
>>> later.
>>
>>
>> Nice! I tested with the Perlin noise benchmark, and it got faster(in my
>> environment, 1.030s -> 0.848s).
>> But floor still consumes almost half of the execution time.
>
>
> Wait, so DMD and GDC did actually emit a memcpy/… here? LDC doesn't, and the
> change didn't have much of an impact on performance.
>

Yes, IIRC _d_arraycopy to be exact (so we loose doubly so!)

> What _does_ have a significant impact, however, is that the whole of floor()
> for doubles can be optimized down to
>     roundsd <…>,<…>,0x1
> when targeting SSE 4.1, or
>     vroundsd <…>,<…>,<…>,0x1
> when targeting AVX.
>
> This is why std.math will need to build on top of compiler-recognizable
> primitives. Iain, Don, how do you think we should handle this?

My opinion is that we should have never have pushed a variable sized
as the baseline for all floating point computations in the first
place.

But as we can't backtrace now, overloads will just have to do.  I
would welcome a DIP to add new core.math intrinsics that could be
proven to be useful for the sake of maintainability (and portability).

Regards
Iain