std.math performance (SSE vs. real)

Mon Jun 30 00:20:59 PDT 2014

On Monday, 30 June 2014 at 04:15:46 UTC, Walter Bright wrote:
> On 6/29/2014 8:22 PM, Manu via Digitalmars-d wrote:
>> Well, here's the thing then. Consider that 'real' is only 
>> actually
>> supported on only a single (long deprecated!) architecture.

>> In x64's case, it is deprecated for over a decade now, and may 
>> be
>> removed from the hardware at some unknown time. The moment 
>> that x64
>> processors decide to stop supporting 32bit code, the x87 will 
>> go away,
>> and those opcodes will likely be emulated or microcoded.
>> Interacting real<->float/double means register swapping through
>> memory. It should be treated the same as float<->simd; they are
>> distinct (on most arch's).
>
> Since they are part of the 64 bit C ABI, that would seem to be 
> in the category of "nevah hoppen".

What I think is highly likely is that it will only have legacy 
support, with such awful performance that it never makes sense to 
use them. For example, the speed of 80-bit and 64-bit 
calculations in x87 used to be identical. But on recent Intel 
CPUs, the 80-bit operations run at half the speed of the 64 bit 
operations. They are already partially microcoded.

For me, a stronger argument is that you can get *higher* 
precision using doubles, in many cases. The reason is that FMA 
gives you an intermediate value with 128 bits of precision; it's 
available in SIMD but not on x87.

So, if we want to use the highest precision supported by the 
hardware, that does *not* mean we should always use 80 bits.

I've experienced this in CTFE, where the calculations are 
currently done in 80 bits, I've seen cases where the 64-bit 
runtime results were more accurate, because of those 128 bit FMA 
temporaries. 80 bits are not enough!!