Expression templates

Tue Dec 12 23:30:15 PST 2006

Don Clugston wrote:
> The recent language improvements (especially, opAssign) have made 
> expression templates eminently feasible. Here's a proof-of-concept which
> implements parsing of the basic vector operations of addition and scalar 
> multiplication.

I've made a slightly improved version, which swaps the parameter 
ordering, in order to hide the latency of * on an x87.
Translation of the postfix notation into x87 instructions is 
straightforward:
scalar *  becomes  fmul real ptr[c];
Vec +     becomes  fadd double ptr[v + 8*EDX];
+         becomes  fadd ST(1), ST
Vec       (without +) becomes fld double ptr[v + 8*EDX];

and at the end of the loop there's
fstp double ptr[r + 8*EDX];
inc EDX;
jnz start;

where c are the addresses of the scalars, v are the address of the 
vectors loaded into integer registers, and r the address of the result 
vector.
EDX is the counter variable.

Using BCS's trick of creating a fake tuple to allow static foreach to 
operate as a simple counter, we could actually output this code.

There would be some mucking around to identify the vectors at each 
stage, and also we'd run out of integer registers quite quickly, but
basically this technique could generate x87 code that is extremely close
to optimal.

Output:
----
In postfix: Vec Vec +
In postfix: Vec scalar *
In postfix: Vec scalar *  Vec Vec +  +  scalar *  Vec scalar *  Vec +  +
b = c+a
b = c*2
a = (((c*3.2)+(c+b))*4.936)+((a*27.4)+c)
----