std.math performance (SSE vs. real)

Thu Jun 26 18:31:14 PDT 2014

Hi all,

right now, the use of std.math over core.stdc.math can cause a 
huge performance problem in typical floating point graphics code. 
An instance of this has recently been discussed here in the 
"Perlin noise benchmark speed" thread [1], where even LDC, which 
already beat DMD by a factor of two, generated code more than 
twice as slow as that by Clang and GCC. Here, the use of floor() 
causes trouble. [2]

Besides the somewhat slow pure D implementations in std.math, the 
biggest problem is the fact that std.math almost exclusively uses 
reals in its API. When working with single- or double-precision 
floating point numbers, this is not only more data to shuffle 
around than necessary, but on x86_64 requires the caller to 
transfer the arguments from the SSE registers onto the x87 stack 
and then convert the result back again. Needless to say, this is 
a serious performance hazard. In fact, this accounts for an 1.9x 
slowdown in the above benchmark with LDC.

Because of this, I propose to add float and double overloads (at 
the very least the double ones) for all of the commonly used 
functions in std.math. This is unlikely to break much code, but:
  a) Somebody could rely on the fact that the calls effectively 
widen the calculation to 80 bits on x86 when using type deduction.
  b) Additional overloads make e.g. "&floor" ambiguous without 
context, of course.

What do you think?

Cheers,
David

[1] http://forum.dlang.org/thread/lo19l7$n2a$1@digitalmars.com
[2] Fun fact: As the program happens only deal with positive 
numbers, the author could have just inserted an int-to-float 
cast, sidestepping the issue altogether. All the other language 
implementations have the floor() call too, though, so it doesn't 
matter for this discussion.