std.math performance (SSE vs. real)

Fri Jun 27 14:10:08 PDT 2014

On 06/27/2014 08:19 PM, Kagamin wrote:
> On Friday, 27 June 2014 at 14:50:14 UTC, Kai Nacke wrote:
>> The doubledouble type is available for PowerPC. In fact, I try to use
>> this for my PowerPC64 port of LDC. The partial support here is a bit
>> annoying but I did not find the time to implement the missing
>> functions myself.
>>
>> It is "native" in the sense that it is a supported type by gcc and xlc.
>
> Doesn't SSE2 effectively operate on double doubles too with instructions
> like addpd (and others *pd)?

I'm everything but an assembly guru (so please correct me if I'm wrong), 
but if my understanding is right, SSE2 only operates element-wise (at 
least for the operations you are mentionning).
For instance, if you operate on two "double2" vectors (in pseudo-code) :
   c[] = a[] # b[]
where # is a supported binary operation, then the value of the first 
element of c only depends on the first elements of a and b.

The idea of double-double is that you operate on two doubles in such a 
way that if you "concatenate" the mantissas of both, then you 
effectively obtain the correct mathematical semantics of a quadruple 
precision floating point number, with a higher number of significant 
digits (~31 vs ~16 for double, in base 10).

I am not 100% sure yet, but I think that the idea is to simulate a 
floating point number with a 106 bit mantissa and a 12 bit exponent as
   x = s * ( m1 + m2 * 2^(-53) ) * 2^(e-b)
     = s * m1 * 2^(e-b) + s * m2 * 2^(e-b-53)
where s is the sign bit (the same for both doubles), m1 and m2 the 
mantissas (including the implied 1 for normalized numbers), e the base-2 
exponent, b the common bias and 53 an extra bias for the low-order bits 
(I'm ignoring the denormalized numbers and the special values). The 
mantissa m1 of the first double gives the first 53 significant bits, and 
this of the second (m2) the extra 53 bits.

The addition is quite straightforward, but it gets tricky when 
implementing the other operations. The articles I mentionned in my 
previous post describe these operations for "quadruple-doubles", 
achieving a ~62 digit precision (implemented in the QD library, but 
there is also a CUDA implemetation). It is completely overkill for most 
applications, but it can be useful for studying the convergence of 
numerical algorithms, and double-doubles can provide the extra precision 
needed in some simulations (or to compare the results with double 
precision).

It is also a comparatively faster alternative to arbitrary-precision 
floating-point libraries like GMP/MPFR, since it does not need to 
emulate every single digit, but instead takes advantage of the native 
double precision instructions. The downside is that you cannot get more 
significant bits than n*53, which is not suitable for computing the 
decimals of pi for instance.

To give you more details, I will need to study these papers more 
thoroughly. I am actually considering bringing double-double and 
quad-double software support to D, either by making a binding to QD, 
porting it or starting from scratch based on the papers. I don't know if 
it will succeed but it will be an interesting exercise anyway. I don't 
have a lot of time right now but I will try to start working on it in a 
few weeks. I'd really like to be able to use it with D. Having to 
rewrite an algorithm in C++ where I could only change one template 
argument in the main() can be quite painful :-)