std.math performance (SSE vs. real)

Thu Jun 26 19:09:59 PDT 2014

On Friday, 27 June 2014 at 01:31:17 UTC, David Nadlinger wrote:
> Hi all,
>
> right now, the use of std.math over core.stdc.math can cause a 
> huge performance problem in typical floating point graphics 
> code. An instance of this has recently been discussed here in 
> the "Perlin noise benchmark speed" thread [1], where even LDC, 
> which already beat DMD by a factor of two, generated code more 
> than twice as slow as that by Clang and GCC. Here, the use of 
> floor() causes trouble. [2]
>
> Besides the somewhat slow pure D implementations in std.math, 
> the biggest problem is the fact that std.math almost 
> exclusively uses reals in its API. When working with single- or 
> double-precision floating point numbers, this is not only more 
> data to shuffle around than necessary, but on x86_64 requires 
> the caller to transfer the arguments from the SSE registers 
> onto the x87 stack and then convert the result back again. 
> Needless to say, this is a serious performance hazard. In fact, 
> this accounts for an 1.9x slowdown in the above benchmark with 
> LDC.
>
> Because of this, I propose to add float and double overloads 
> (at the very least the double ones) for all of the commonly 
> used functions in std.math. This is unlikely to break much 
> code, but:
>  a) Somebody could rely on the fact that the calls effectively 
> widen the calculation to 80 bits on x86 when using type 
> deduction.
>  b) Additional overloads make e.g. "&floor" ambiguous without 
> context, of course.
>
> What do you think?
>
> Cheers,
> David
>
>
> [1] http://forum.dlang.org/thread/lo19l7$n2a$1@digitalmars.com
> [2] Fun fact: As the program happens only deal with positive 
> numbers, the author could have just inserted an int-to-float 
> cast, sidestepping the issue altogether. All the other language 
> implementations have the floor() call too, though, so it 
> doesn't matter for this discussion.

I honestly alway thought that it was a little odd that it forced 
conversion to real. Personally I support this. It would also make 
generic code that calls math functions more simple as it wouldn't 
require casts back.