std.math performance (SSE vs. real)

Sun Jun 29 20:22:15 PDT 2014

On 28 June 2014 16:16, Walter Bright via Digitalmars-d
<digitalmars-d at puremagic.com> wrote:
> On 6/27/2014 10:18 PM, Walter Bright wrote:
>>
>> On 6/27/2014 4:10 AM, John Colvin wrote:
>>>
>>> *The number of algorithms that are both numerically stable/correct and
>>> benefit
>>> significantly from > 64bit doubles is very small.
>>
>>
>> To be blunt, baloney. I ran into these problems ALL THE TIME when doing
>> professional numerical work.
>>
>
> Sorry for being so abrupt. FP is important to me - it's not just about
> performance, it's also about accuracy.

Well, here's the thing then. Consider that 'real' is only actually
supported on only a single (long deprecated!) architecture.

I think it's reasonable to see that 'real' is not actually an fp type.
It's more like an auxiliary type, which just happens to be supported
via a completely different (legacy) set of registers on x64 (most
arch's don't support it at all).
In x64's case, it is deprecated for over a decade now, and may be
removed from the hardware at some unknown time. The moment that x64
processors decide to stop supporting 32bit code, the x87 will go away,
and those opcodes will likely be emulated or microcoded.
Interacting real<->float/double means register swapping through
memory. It should be treated the same as float<->simd; they are
distinct (on most arch's).

For my money, x87 can only be considered, at best, a coprocessor (a
slow one!), which may or may not exist. Software written today (10+
years after the hardware was deprecated) should probably even consider
introducing runtime checks to see if the hardware is even present
before making use of it.

It's fine to offer a great precise extended precision library, but I
don't think it can be _the_ standard math library which is used by
everyone in virtually all applications. It's not a defined part of the
architecture, it's slow, and it will probably go away in the future.

It's the same situation with SIMD; on x64, the SIMD unit and the FPU
are the same unit, but I don't think it's reasonable to design all the
API's around that assumption. Most processors separate the SIMD unit
from the FPU, and the language decisions reflect that. We can't make
the language treat SIMD just like an FPU extensions on account of just
one single architecture... although in that case, the argument would
be even more compelling since x64 is actually current and active.