std.math performance (SSE vs. real)

Manu via Digitalmars-d digitalmars-d at puremagic.com
Fri Jun 27 03:50:52 PDT 2014


On 27 June 2014 11:31, David Nadlinger via Digitalmars-d
<digitalmars-d at puremagic.com> wrote:
> Hi all,
>
> right now, the use of std.math over core.stdc.math can cause a huge
> performance problem in typical floating point graphics code. An instance of
> this has recently been discussed here in the "Perlin noise benchmark speed"
> thread [1], where even LDC, which already beat DMD by a factor of two,
> generated code more than twice as slow as that by Clang and GCC. Here, the
> use of floor() causes trouble. [2]
>
> Besides the somewhat slow pure D implementations in std.math, the biggest
> problem is the fact that std.math almost exclusively uses reals in its API.
> When working with single- or double-precision floating point numbers, this
> is not only more data to shuffle around than necessary, but on x86_64
> requires the caller to transfer the arguments from the SSE registers onto
> the x87 stack and then convert the result back again. Needless to say, this
> is a serious performance hazard. In fact, this accounts for an 1.9x slowdown
> in the above benchmark with LDC.
>
> Because of this, I propose to add float and double overloads (at the very
> least the double ones) for all of the commonly used functions in std.math.
> This is unlikely to break much code, but:
>  a) Somebody could rely on the fact that the calls effectively widen the
> calculation to 80 bits on x86 when using type deduction.
>  b) Additional overloads make e.g. "&floor" ambiguous without context, of
> course.
>
> What do you think?
>
> Cheers,
> David
>
>
> [1] http://forum.dlang.org/thread/lo19l7$n2a$1@digitalmars.com
> [2] Fun fact: As the program happens only deal with positive numbers, the
> author could have just inserted an int-to-float cast, sidestepping the issue
> altogether. All the other language implementations have the floor() call
> too, though, so it doesn't matter for this discussion.

Totally agree.
Maintaining commitment to deprecated hardware which could be removed
from the silicone at any time is a bit of a problem looking forwards.
Regardless of the decision about whether overloads are created, at
very least, I'd suggest x64 should define real as double, since the
x87 is deprecated, and x64 ABI uses the SSE unit. It makes no sense at
all to use real under any general circumstances in x64 builds.

And aside from that, if you *think* you need real for precision, the
truth is, you probably have bigger problems.
Double already has massive precision. I find it's extremely rare to
have precision problems even with float under most normal usage
circumstances, assuming you are conscious of the relative magnitudes
of your terms.


More information about the Digitalmars-d mailing list