std.math performance (SSE vs. real)

Mon Jun 30 00:51:50 PDT 2014

On Monday, 30 June 2014 at 07:21:00 UTC, Don wrote:
> On Monday, 30 June 2014 at 04:15:46 UTC, Walter Bright wrote:
>> On 6/29/2014 8:22 PM, Manu via Digitalmars-d wrote:
>>> Well, here's the thing then. Consider that 'real' is only 
>>> actually
>>> supported on only a single (long deprecated!) architecture.
>
>>> In x64's case, it is deprecated for over a decade now, and 
>>> may be
>>> removed from the hardware at some unknown time. The moment 
>>> that x64
>>> processors decide to stop supporting 32bit code, the x87 will 
>>> go away,
>>> and those opcodes will likely be emulated or microcoded.
>>> Interacting real<->float/double means register swapping 
>>> through
>>> memory. It should be treated the same as float<->simd; they 
>>> are
>>> distinct (on most arch's).
>>
>> Since they are part of the 64 bit C ABI, that would seem to be 
>> in the category of "nevah hoppen".
>
> What I think is highly likely is that it will only have legacy 
> support, with such awful performance that it never makes sense 
> to use them. For example, the speed of 80-bit and 64-bit 
> calculations in x87 used to be identical. But on recent Intel 
> CPUs, the 80-bit operations run at half the speed of the 64 bit 
> operations. They are already partially microcoded.
>
> For me, a stronger argument is that you can get *higher* 
> precision using doubles, in many cases. The reason is that FMA 
> gives you an intermediate value with 128 bits of precision; 
> it's available in SIMD but not on x87.
>
> So, if we want to use the highest precision supported by the 
> hardware, that does *not* mean we should always use 80 bits.
>
> I've experienced this in CTFE, where the calculations are 
> currently done in 80 bits, I've seen cases where the 64-bit 
> runtime results were more accurate, because of those 128 bit 
> FMA temporaries. 80 bits are not enough!!

This is correct and we use this now for some time critical code 
that requires high precision.

But anything non-time critical (~80%-85% of our code) we simply 
use a software solution when precision becomes an issue. It is 
here that I think the extra bits in D real can be enough to get a 
performance gain.

But I won't argue with you think I'm wrong. I'm only basing this 
on anecdotal evidence of what I saw from 5-6 apps ported from C++ 
to D :-)

Cheers,
ed