The Matrix to end all Matrix classes (Let's dream!) - BLADE 0.4Alpha

Don Clugston dac at nospam.com.au
Tue Nov 20 07:54:57 PST 2007


For reference, here's the state of BLADE:

Syntax is:

mixin(vectorize("expression"));

eg,

mixin(vectorize("some_array -= 3.2 * (5.8 * another_array) * some_constant"));

It would be so nice to get rid of the mixin() and the quotes. But they don't 
impede development in any way. Ideally it would be:

vectorize(some_array -= 3.2 * (5.8 * another_array) * some_constant);

Most significant changes since I my last post about it.
(a) it finally works in a usable form :-). Thanks Walter for some great bugfixes 
in the past few releases!  I'm using dmd 1.023.
(b) generates X87, SSE, SSE2, or inline D code, depending on the complexity of 
the expression and the types involved.
(c) an expression rewriting step has been added, which performs scalar/const 
folding, and indexing/slicing folding
    (eg, (A+7*B)[0..5] --->   A[0..5] + 7*(B[0..5]) ).
(d) it gives really, really nice error messages and debug output. No template 
garbage from the library internals -- just a straightforward one-line error 
message, giving the line in YOUR code which contains the error.

eg,   with an array a[],
     mixin(vectorize("any+= old*garbage"));
     mixin(vectorize("a+= 2"));

you get this output (and NOTHING ELSE):
demo.d(38): static assert  "BLADE: Undefined symbols: any old garbage"
demo.d(39): static assert  "BLADE: Rank mismatch (addition or subtraction)"

It's still not terribly useful, since it only generates code for packed vectors. 
But the infrastructure is very solid.

Here's a particularly nasty example which the constant folding can cope with, to 
generate SSE2 code:

     double [] a = new double[4];
     double [] d = [0.5, 2.8, 17.0, 28.25, 1, 56.2, 3.4];
     a[0..$] = [3.4, 565, 31.3, 41.8];
     double [4][] another = [[33.1, 4543, 43, 878.7], [5.14, 455, 554, 2.43]];

     mixin(vectorize(
` a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]`));

-------
Generates this front-end code (compile with -debug=BladeFrontEnd to see it). 
Note that there are many asserts to give nice debug info at runtime, but the 
only runtime code is a single function call, which passes 3 pointers and a 
double into an asm function (there's no inlining work for the compiler to do):

------
// bladedemo.d(34)  a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]
assert(a.length==another[][1][(3u-3)..$].length, `Vector length mismatch`);
assert(d[2..($-1)][(3u-3)..$].length==another[][1][(3u-3)..$].length, `Vector 
length mismatch`);
assert( (cast(size_t)(a.ptr)& 0x0F) == 0, `SSE Vector misalignment: a`);
assert( (cast(size_t)(d[2..($-1)][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector 
misalignment: d[2..($-1)][(3u-3)..$]`);
assert( (cast(size_t)(another[][1][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector 
misalignment: another[][1][(3u-3)..$]`);

SSEVECGEN!(2,"A+=((B*C)-D)",double*,double,double*,double*)(another[][1][(3u-3)..$].length,&a[0],((a[2])*2.01),&d[2..($-1)][(3u-3)..$][0],&another[][1][(3u-3)..$][0]);

-----
The function consists of this ASM code (compile with -debug=BladeBackEnd to see 
it; BTW all comments are auto-generated). Note there are only 8 asm instructions 
in the inner loop:

-------
// Operation : ACB*D-+A=

asm {
  push EBX;
   mov EAX, veclength;
   lea ECX, [8*EAX];     add ECX, values[0];  //A
   movsd XMM0, values[1];   shufpd XMM0, XMM0,0; //B
   lea EDX, [8*EAX];     add EDX, values[2];  //C
   lea EBX, [8*EAX];     add EBX, values[3];  //D
   xor EAX, EAX;
   sub EAX, veclength; // counter=-length
   jz short L2; // test for length==0

   align 16;
L1:
   movapd XMM1, [ECX + 8*EAX];  // A
   movapd XMM2, [EDX + 8*EAX];  // C
   mulpd XMM2, XMM0; // B*
   subpd  XMM2, [EBX + 8*EAX];  // D-
   addpd XMM1, XMM2;  //+
   movapd [ECX + 8*EAX], XMM1;  // A=
   add EAX,2;
   js L1;
L2:
   sub EAX, 2;
   jns L4;
   movsd XMM1, [ECX + 8*EAX+16];  // A
   movsd XMM2, [EDX + 8*EAX+16];  // C
   mulsd XMM2, XMM0; // B*
   subsd  XMM2, [EBX + 8*EAX+16];  // D-
   addsd XMM1, XMM2;  //+
   movsd [ECX + 8*EAX+16], XMM1;  // A=
L4:
;  pop EBX;
}
-------



More information about the Digitalmars-d mailing list