The Matrix to end all Matrix classes (Let's dream!) - BLADE 0.4Alpha
Don Clugston
dac at nospam.com.au
Tue Nov 20 07:54:57 PST 2007
For reference, here's the state of BLADE:
Syntax is:
mixin(vectorize("expression"));
eg,
mixin(vectorize("some_array -= 3.2 * (5.8 * another_array) * some_constant"));
It would be so nice to get rid of the mixin() and the quotes. But they don't
impede development in any way. Ideally it would be:
vectorize(some_array -= 3.2 * (5.8 * another_array) * some_constant);
Most significant changes since I my last post about it.
(a) it finally works in a usable form :-). Thanks Walter for some great bugfixes
in the past few releases! I'm using dmd 1.023.
(b) generates X87, SSE, SSE2, or inline D code, depending on the complexity of
the expression and the types involved.
(c) an expression rewriting step has been added, which performs scalar/const
folding, and indexing/slicing folding
(eg, (A+7*B)[0..5] ---> A[0..5] + 7*(B[0..5]) ).
(d) it gives really, really nice error messages and debug output. No template
garbage from the library internals -- just a straightforward one-line error
message, giving the line in YOUR code which contains the error.
eg, with an array a[],
mixin(vectorize("any+= old*garbage"));
mixin(vectorize("a+= 2"));
you get this output (and NOTHING ELSE):
demo.d(38): static assert "BLADE: Undefined symbols: any old garbage"
demo.d(39): static assert "BLADE: Rank mismatch (addition or subtraction)"
It's still not terribly useful, since it only generates code for packed vectors.
But the infrastructure is very solid.
Here's a particularly nasty example which the constant folding can cope with, to
generate SSE2 code:
double [] a = new double[4];
double [] d = [0.5, 2.8, 17.0, 28.25, 1, 56.2, 3.4];
a[0..$] = [3.4, 565, 31.3, 41.8];
double [4][] another = [[33.1, 4543, 43, 878.7], [5.14, 455, 554, 2.43]];
mixin(vectorize(
` a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]`));
-------
Generates this front-end code (compile with -debug=BladeFrontEnd to see it).
Note that there are many asserts to give nice debug info at runtime, but the
only runtime code is a single function call, which passes 3 pointers and a
double into an asm function (there's no inlining work for the compiler to do):
------
// bladedemo.d(34) a += (d[2..$-1]*2.01*a[2]-another[][1])["abc".length-3..$]
assert(a.length==another[][1][(3u-3)..$].length, `Vector length mismatch`);
assert(d[2..($-1)][(3u-3)..$].length==another[][1][(3u-3)..$].length, `Vector
length mismatch`);
assert( (cast(size_t)(a.ptr)& 0x0F) == 0, `SSE Vector misalignment: a`);
assert( (cast(size_t)(d[2..($-1)][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector
misalignment: d[2..($-1)][(3u-3)..$]`);
assert( (cast(size_t)(another[][1][(3u-3)..$].ptr)& 0x0F) == 0, `SSE Vector
misalignment: another[][1][(3u-3)..$]`);
SSEVECGEN!(2,"A+=((B*C)-D)",double*,double,double*,double*)(another[][1][(3u-3)..$].length,&a[0],((a[2])*2.01),&d[2..($-1)][(3u-3)..$][0],&another[][1][(3u-3)..$][0]);
-----
The function consists of this ASM code (compile with -debug=BladeBackEnd to see
it; BTW all comments are auto-generated). Note there are only 8 asm instructions
in the inner loop:
-------
// Operation : ACB*D-+A=
asm {
push EBX;
mov EAX, veclength;
lea ECX, [8*EAX]; add ECX, values[0]; //A
movsd XMM0, values[1]; shufpd XMM0, XMM0,0; //B
lea EDX, [8*EAX]; add EDX, values[2]; //C
lea EBX, [8*EAX]; add EBX, values[3]; //D
xor EAX, EAX;
sub EAX, veclength; // counter=-length
jz short L2; // test for length==0
align 16;
L1:
movapd XMM1, [ECX + 8*EAX]; // A
movapd XMM2, [EDX + 8*EAX]; // C
mulpd XMM2, XMM0; // B*
subpd XMM2, [EBX + 8*EAX]; // D-
addpd XMM1, XMM2; //+
movapd [ECX + 8*EAX], XMM1; // A=
add EAX,2;
js L1;
L2:
sub EAX, 2;
jns L4;
movsd XMM1, [ECX + 8*EAX+16]; // A
movsd XMM2, [EDX + 8*EAX+16]; // C
mulsd XMM2, XMM0; // B*
subsd XMM2, [EBX + 8*EAX+16]; // D-
addsd XMM1, XMM2; //+
movsd [ECX + 8*EAX+16], XMM1; // A=
L4:
; pop EBX;
}
-------
More information about the Digitalmars-d
mailing list