std.math performance (SSE vs. real)

Fri Jun 27 04:29:36 PDT 2014

On Friday, 27 June 2014 at 11:10:57 UTC, John Colvin wrote:
> On Friday, 27 June 2014 at 10:51:05 UTC, Manu via Digitalmars-d 
> wrote:
>> On 27 June 2014 11:31, David Nadlinger via Digitalmars-d
>> <digitalmars-d at puremagic.com> wrote:
>>> Hi all,
>>>
>>> right now, the use of std.math over core.stdc.math can cause 
>>> a huge
>>> performance problem in typical floating point graphics code. 
>>> An instance of
>>> this has recently been discussed here in the "Perlin noise 
>>> benchmark speed"
>>> thread [1], where even LDC, which already beat DMD by a 
>>> factor of two,
>>> generated code more than twice as slow as that by Clang and 
>>> GCC. Here, the
>>> use of floor() causes trouble. [2]
>>>
>>> Besides the somewhat slow pure D implementations in std.math, 
>>> the biggest
>>> problem is the fact that std.math almost exclusively uses 
>>> reals in its API.
>>> When working with single- or double-precision floating point 
>>> numbers, this
>>> is not only more data to shuffle around than necessary, but 
>>> on x86_64
>>> requires the caller to transfer the arguments from the SSE 
>>> registers onto
>>> the x87 stack and then convert the result back again. 
>>> Needless to say, this
>>> is a serious performance hazard. In fact, this accounts for 
>>> an 1.9x slowdown
>>> in the above benchmark with LDC.
>>>
>>> Because of this, I propose to add float and double overloads 
>>> (at the very
>>> least the double ones) for all of the commonly used functions 
>>> in std.math.
>>> This is unlikely to break much code, but:
>>> a) Somebody could rely on the fact that the calls effectively 
>>> widen the
>>> calculation to 80 bits on x86 when using type deduction.
>>> b) Additional overloads make e.g. "&floor" ambiguous without 
>>> context, of
>>> course.
>>>
>>> What do you think?
>>>
>>> Cheers,
>>> David
>>>
>>>
>>> [1] http://forum.dlang.org/thread/lo19l7$n2a$1@digitalmars.com
>>> [2] Fun fact: As the program happens only deal with positive 
>>> numbers, the
>>> author could have just inserted an int-to-float cast, 
>>> sidestepping the issue
>>> altogether. All the other language implementations have the 
>>> floor() call
>>> too, though, so it doesn't matter for this discussion.
>>
>> Totally agree.
>> Maintaining commitment to deprecated hardware which could be 
>> removed
>> from the silicone at any time is a bit of a problem looking 
>> forwards.
>> Regardless of the decision about whether overloads are 
>> created, at
>> very least, I'd suggest x64 should define real as double, 
>> since the
>> x87 is deprecated, and x64 ABI uses the SSE unit. It makes no 
>> sense at
>> all to use real under any general circumstances in x64 builds.
>>
>> And aside from that, if you *think* you need real for 
>> precision, the
>> truth is, you probably have bigger problems.
>> Double already has massive precision. I find it's extremely 
>> rare to
>> have precision problems even with float under most normal usage
>> circumstances, assuming you are conscious of the relative 
>> magnitudes
>> of your terms.
>
> I think real should stay how it is, as the largest 
> hardware-supported floating point type on a system. What needs 
> to change is dmd and phobos' default usage of real. Double 
> should be the standard. People should be able to reach for real 
> if they really need it, but normal D code should target the 
> sweet spot that is double*.
>
> I understand why the current situation exists. In 2000 x87 was 
> the standard and the 80bit precision came for free.
>
> *The number of algorithms that are both numerically 
> stable/correct and benefit significantly from > 64bit doubles 
> is very small. The same can't be said for 32bit floats.

Totally agree!
Please add float and double overloads and make double default.
Sometimes float is just enough, but in most times double should 
be used.

If some one need more precision as double can provide then 80bit 
will probably be not enough any way.

IMHO intrinsics should be used as default if possible.