std.math performance (SSE vs. real)

Sat Jun 28 05:43:02 PDT 2014

On 06/28/2014 12:33 PM, Russel Winder via Digitalmars-d wrote:
> On Sat, 2014-06-28 at 09:07 +0000, John Colvin via Digitalmars-d wrote:
> […]
>> I still maintain that the need for the precision of 80bit reals
>> is a niche demand. Its a very important niche, but it doesn't
>> justify having its relatively extreme requirements be the
>> default. Someone writing a matrix inversion has only themselves
>> to blame if they don't know plenty of numerical analysis and look
>> very carefully at the specifications of all operations they are
>> using.
>
> I fear the whole argument is getting misguided. We should reset.
>
> If you are doing numerical calculations then accuracy is critical.
> Arbitrary precision floats are the only real (!) way of doing any
> numeric non-integer calculation, and arbitrary precision integers are
> the only way of doing integer calculations.
>
> However speed is also an issue, so to obtain speed we have hardware
> integer and floating point ALUs.
>
> The cost for the integer ALU is bounded integers. Python appreciates
> this and uses hardware integers when it can and software integers
> otherwise. Thus Python is very good for doing integer work. C, C++, Go,
> D, Fortran, etc. are fundamentally crap for integer calculation because
> integers are bounded. Of course if calculations are prvably within the
> hardware integer bounds this is not a constraint and we are happy with
> hardware integers. Just don't try calculating factorial, Fibonacci
> numbers and other numbers used in some bioinformatics and quant models.
> There is a reason why SciPy has a massive following in bioinformatics
> and quant comuting.
>
> The cost for floating point ALU is accuracy. Hardware floating point
> numbers are dreadful in that sense, but again the issue is speed and for
> GPU they went 32-bit for speed. Now they are going 64-bit as they can
> just about get the same speed and the accuracy is so much greater. For
> hardware floating point the more bits you have the better. Hence IBM in
> the 360 and later having 128-bit floating point for accuracy at the
> expense of some speed. Sun had 128-bit in the SPARC processors for
> accuracy at the expense of a little speed.
>
> As Walter has or will tell us, C (and thus C++) got things woefully
> wrong in support of numerical work because the inventors were focused on
> writing operating systems, supporting only PDP hardware. They and the
> folks that then wrote various algorithms didn't really get numerical
> analysis. If C had targeted IBM 360 from the outset things might have
> been better.
>
> We have to be clear on this: Fortran is the only language that supports
> hardware floating types even at all well.
>
> Intel's 80-bit floating point were an aberration, they should just have
> done 128-bit in the first place. OK so they got the 80-bit stuff as a
> sort of free side-effect of creating 64-bit, but they ran with.  They
> shouldn't have done. I cannot see it ever happening again. cf. ARM.
>
> By being focused on Intel chips, D has failed to get floating point
> correct in avery analogous way to C failing to get floating point types
> right by focusing on PDP. Yes using 80-bit on Intel is good, but no-one
> else has this. Floating point sizes should be 32-, 64-, 128-, 256-bit,
> etc. D needs to be able to handle this. So does C, C++, Java, etc. Go
> will be able to handle it when it is ported to appropriate hardware as
> they use float32, float64, etc. as their types. None of this float,
> double, long double, double double rubbish.
>
> So D should perhaps make a breaking change and have types int32, int64,
> float32, float64, float80, and get away from the vagaries of bizarre
> type relationships with hardware?
>

+1 for float32 & cie. These names are much more explicit than the 
current ones. But I see two problems with it :

  - These names are already used in core.simd to denote vectors, and AVX 
3 (which should appear in mainstream CPUs next year) will require to use 
float16, so the next revision might cause a collision. This could be 
avoided by using real32, real64... instead, but I prefer floatxx since 
it reminds us that we are not dealing with an exact real number.

  - These types are redundant, and people coming from C/C++ will likely 
use float and double instead. It's much too late to think of deprecating 
them since it would break backward compatibility (although it would be 
trivial to update the code with "DFix"... if someone is still 
maintaining the code).

A workaround would be to use a template which maps to the correct native 
type, iff it has the exact number of bits specified, or issues an error. 
Here is a quick mockup (does not support all types). I used "fp" instead 
of "float" or "real" to avoid name collisions with the current types.

template fp(uint n) {

         static if (n == 32) {
                 alias fp = float;
         } else static if (n == 64) {
                 alias fp = double;
         } else static if (n == 80) {
                 static if (real.mant_dig == 64) {
                         alias fp = real;
                 } else {
                         static assert(false, "No 80 bit floating point 
type supported on this architecture");
                 }
         } else static if (n == 128) {
                 alias fp = quadruple; // Or doubledouble on PPC. Add 
other static ifs if necessary.
         } else {
                 import std.conv: to;
                 static assert(false, "No "~to!string(n)~" bit floating 
point type.");
         }
}

void main() {

         fp!32 x = 3.1415926;
         assert(is(typeof(x) == float));

         fp!64 y = 3.141592653589793;
         assert(is(typeof(y) == double));

         fp!80 z = 3.14159265358979323846;
         assert(is(typeof(z) == real));

         /* Fails on x86_64, as it should, but the error message could 
be made more explicit.
          * Currently : "undefined identifier quadruple"
          * Should ideally be : "No native 128 bit floating-point type 
supported on x86_64 architecture."
          */
         /*
         fp!128 w = 3.14159265358979323846264338327950288;
         assert(is(typeof(w) == quadruple));
         */
}